async / stream support #195

dduponchel · 2015-01-08T21:03:18Z

This pull request replaces #141.
The current API is synchronous : if JSZip takes too much time to finish its task, the page crashes
(it freezes during the task anyway). This commit does a the following :

rewrite the code into workers which are asynchronous
replace the public methods
add nodejs stream support
break the compatibility with existing code

Note : I created a branch jszip_v3 which I target : a change this big can't go
to the current v2.

Workers

A worker is like a nodejs stream but with some differences. On the good side :

it works on IE 6-9 without any issue / polyfill
it weights less than the full dependencies bundled with browserify
it forwards errors (no need to declare an error handler EVERYWHERE)

On the bad side :
It adds new code to maintain, fix, optimize.

A chunk is an object with 2 attributes : meta and data. The former is an
object containing anything (percent for example), see each worker for more
details. The latter is the real data (String, Uint8Array, etc).

Public API

I've updated my gh-pages with the latest API documentation.

Getters are replaced by nodeStream(type) and async(type), generate() is replaced by generateNodeStream and generateAsync. The "async" methods return a Promise, the "stream" methods return a Stream3 (from nodejs) if available.
They have an update callback to get access to metadata like the
name of the current file being compressed or the operation progress (with percent).

With this pull request, the API will be :

// generate
zip.generateAsync(...) // return a Promise of the content
zip.generateNodeStream(...) // return the node stream
zip.generateInternalStream(...) // return the internal stream to let the user write glue code

// getters
zip.async("arraybuffer") // return a Promise
zip.nodeStream("arraybuffer") // return a nodejs stream
zip.internalStream(...) // return the internal stream to let the user write glue code

The previous sync methods now throw exceptions.

Nodejs stream support

file(name, data) now accepts a nodejs stream as data. The "nodeStream" methods generates streams.

Breaking changes

The undocumented JSZip.compressions object changed : the object now returns
workers to do the job, the previous methods are not used anymore.

A lot of sync methods has been removed.

Optimize the call to file(name) : instead of using filter (in O(n)), fetch the entry directly (in O(1)).

The index went too far, causing the optimizer to drop the compiled code.

Because of the try/catch, the JIT compiler can't do its magic. This patch moves some of the hot code into separate functions : those functions can be compiled, the try/catch remains in the main loop.

When loading a zip file with unicode path/comment, we were combining two costly operations : read data as (binary) string and then convert the binary string to a real string. This patch skip the first string conversion.

The previous code used a closure to keep a reference to the compressed content and pointers to the beginning/end of the file in this content. Then we had some complicated code to extract the content and handle the corner cases. This new version adds two new ("private" with a _) methods to the ZipObject instances : _compress and _decompress. A ZipObject instance will have two states : compressed (with a CompressedObject) and decompressed (with a string, uint8array, etc). With this patch, no more closure : for each file we create a sub{string,array} for the data and then create a CompressedObject.

The transformation uint8array -> string (to check against the signature) can be costly if repeated for a lot of entries. This commit uses the reader to check the signature, speeding things with a Uint8ArrayReader.

The separation between the generation of the CompressedObjects and the actual use of them will help for the next commit, adding generateAsync().

The "content is empty" check can be done at the insertion of the content and not when compressing it.

This commit addresses the timeout issue. The current API is synchronous : if JSZip takes too much time to finish its task, the page crashes (it freezes during the task anyway). This commit does a the following : - rewrite the code into workers which can be asynchronous - add the needed public methods - add nodejs stream support - break the compatibility with existing code Workers ------- A worker is like a nodejs stream but with some differences. On the good side : - it works on IE 6-9 without any issue / polyfill - it weights less than the full dependencies bundled with browserify - it forwards errors (no need to declare an error handler EVERYWHERE) On the bad side : To get sync AND async methods on the public API without duplicating a lot of code, this class has `isSync` attribute and some if/then to choose between doing stuff now, or using an async callback. It is dangerously close to releasing Zalgo (see http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony for more). A chunk is an object with 2 attributes : `meta` and `data`. The former is an object containing anything (`percent` for example), see each worker for more details. The latter is the real data (String, Uint8Array, etc). Public API ---------- Each method generating data (generate, asText, etc) gain a stream sibling : generateStream, asTextStream, etc. This will need a solid discussion because I'm not really satified with this. Nodejs stream support --------------------- With this commit, `file(name, data)` accepts a nodejs stream as data. It also adds a `asNodejsStream()` on the StreamHelper. Breaking changes ---------------- The undocumented JSZip.compressions object changed : the object now returns workers to do the job, the previous methods are not used anymore. Not broken yet, but the the `checkCRC32` (when loading a zip file, it synchronously check the crc32 of every files) will need to be replaced.

This commit updates the saucelabs dependencies to the latest and update the configuration : - wait up to 10minutes, IE 6 is slow - launch max 3 parallel jobs - add IE6, IE11, safari 7, safari 8

Grunt-browserify adde in v3.2.0 a `banner` option, use it. Also ignore the lib/nodejs/* files : they only make sense with streams.

When paused, a nodejs stream CAN and WILL emit "end" events.

dduponchel · 2015-04-06T11:53:45Z

Keeping the sync methods will be too much troubles, I'll remove them.

That would give

zip.file("content.txt").async("uint8array", function onComplete(){...});
zip.file("content.txt").async({
    type: "uint8array",
    onUpdate: function () {...} // allow progress update notifications
}, function onComplete(){...});

zip.file("content.txt").stream("uint8array").on("data", ...); // our simple stream
zip.file("content.txt").nodeStream("uint8array").on("data", ...); // nodejs stream

zip.generateAsync({...}, function onComplete(){...});
zip.generateStream({...}).on("data", ...);
zip.generateNodeStream({...}).on("data", ...);

The old methods will throw an exception like "This method has been removed in JSZip 3.0, please check the upgrade guide.".

I'll remove the synchronous .load() method too : a sync checkCRC32 option isn't possible anymore and an async method allows more things (loading from a Blob, from a nodejs file descriptor, etc).

For the async operations, ES6 Promises look like the right choice (instead of manually adding a callback) but that means adding more bytes in the build.

@Stuk Does this sound reasonable ?

andrewvarga · 2015-05-18T08:53:37Z

Is this going to be part of the stable release some time? (Or any other async (webworker) support?

dduponchel · 2015-05-26T20:51:57Z

I'm still working on it... When I have enough time. I may have some time this weekend to continue working on the async load.

andrewvarga · 2015-05-26T21:05:11Z

Let me know if I can help, I could actually make it work in a webworker really easily by just adding some wrapper methods on top of jszip.min.js like:

self.onmessage = function(message) {
    var messageData = message.data;
    var command = messageData.command;
    var args = messageData.args;
    var requestId = messageData.requestId;

    switch (command) {
        case "load":
                   jsZip = new JSZip();
            jsZip.load(args[0]);

            self.postMessage({
                requestId: requestId
            });
                       break;
        case "parse":
           // ... etc

This way I didn't need to touch the actual JSZIP code at all, it just allows to call its functions in a webworker.

This commit removes most of the sync methods (everything but the load method). The new getters are : - zip.async(type) : es6 promises - zip.stream(type) : nodejs Stream3 - zip.internalStream(type) : our internal stream (useful to create wrappers). The new generate methods are : - zip.generateAsync() : es6 promises - zip.generateStream() : nodejs Stream3 - zip.generateInternalStream() : our internal stream With the "stream" keyword, everyone will think of the nodejs stream : it's safer to use "internalStream" for our own. The old sync versions will throw an exception. Also fix the stream version we use : we use Stream3, not the stream offered by the current node runtime. Also changed in generate : prefer "binarystring" over "string". That way, JSZip#generate will be consistent with ZipObject#async. "string" is still supported because a zip as a text don't make sense.

This commit remove the sync "load()" method. It has been replaced by "loadAsync()" which returns a Promise.

dduponchel · 2015-06-02T19:30:19Z

I've updated the pull request with async methods and I've updated the description. I've updated my gh-pages with the latest API documentation.
The next step will be to merge this with master.

@andrewvarga doesn't importScripts solve this issue ? Making JSZip Worker aware may give a strange API.

balaclark · 2015-07-11T09:38:30Z

Looks great, async is very necessary for large files.

JSZip 2.5.0

setTimeout has 4ms minimum delay while on a recent browser a chunk takes ~0.2ms of processing. removing these 4ms make a huge difference. Main downside: the polyfill globally declares setImmediate.

dduponchel · 2015-07-18T12:11:16Z

Up-to-date with v2.5.0 (I moved the branch jszip_v3 to this tag). @Stuk are you ok with this pull request ?

Browsers are also adding streams, see [1], [2], [3]. While the spirit is the same, the API is different from nodejs streams. To avoid future confusion, I prefer renaming `generateStream` to `generateNodeStream` and `stream` to `nodeStream` (we already have `nodebuffer` as type). [0]: https://streams.spec.whatwg.org/ [1]: https://blog.wanderview.com/blog/2015/06/19/intent-to-implement-streams-in-firefox/ [2]: https://www.chromestatus.com/features/5804334163951616

When I migrated the tests from `asText()` to `async("string")`, I typed a lot of `async("text")`. I strongly suspect that other users will have the issue too while migrating.

dduponchel · 2015-08-02T20:12:40Z

I've prefixed stream methods with "node": browsers are also adding streams, see 1, 2, 3. While the spirit is the same, the API is different from nodejs streams. To avoid future confusion, I prefer renaming generateStream to generateNodeStream and stream to nodeStream (we already have nodebuffer as type).

Stuk · 2015-08-05T17:18:39Z

lib/generate.js

+    return zipFileWorker;
+};
+
+/**


Should ZipFileWorker be put in its own file?

Split the generate.js file into generate/index.js and generate/ZipFileWorker.js.

The error event wasn't sent upward, the parent workers were left paused (or running). Also make ".pause()" and ".resume()" return boolean, extending the default behavior is easier with it.

The module `asap` doesn't update the global scope. The only downside: we can't cancel the async call anymore. Instead, we now let the async call happen and return early if the stream is paused or finished.

Stuk · 2015-09-12T22:51:59Z

👍 is this ready to me merged then?

dduponchel · 2015-09-13T15:00:04Z

@Stuk this pull request is ready, I'll continue on #224.

async / stream support

0cv · 2015-11-13T08:35:35Z

damned, this is amazing, seriously. v2.5 just failed and crashed after 1 minute or so on few hundreds files weighting between 10KB to 500KB each and this new version just did the whole job in half a second! Unless there are some bugs (which I have not found on my few basic tests), it would make sense it becomes the new official version.

Small info: grunt (which I had to use to produce the dist v3.0) is somehow outdated with its uglify dependency and throws an error. Updating uglify to the latest version simply fixed the problem.

dduponchel added 22 commits January 7, 2015 22:06

optimize the call to file(name)

ba43b9c

Optimize the call to file(name) : instead of using filter (in O(n)), fetch the entry directly (in O(1)).

allow the JIT compilation of JSZip.base64

f4744b8

The index went too far, causing the optimizer to drop the compiled code.

function arrayLikeToString : allow JIT

398b648

Because of the try/catch, the JIT compiler can't do its magic. This patch moves some of the hot code into separate functions : those functions can be compiled, the try/catch remains in the main loop.

the filename is already read in the local part

7282f0d

unicode path/comment : skip a step

69c42cc

When loading a zip file with unicode path/comment, we were combining two costly operations : read data as (binary) string and then convert the binary string to a real string. This patch skip the first string conversion.

ZipEntry : avoid StringReader if possible

08d15f4

remove an useless string conversion

a031847

Inline function call.

68cae7d

read signature : don't call readString if possible

1e1a5e8

The transformation uint8array -> string (to check against the signature) can be costly if repeated for a lot of entries. This commit uses the reader to check the signature, speeding things with a Uint8ArrayReader.

reading zip : skip useless work

228d3c2

refactor : create writer/ and reader/

736df65

extra fields : pre-calculate end position

2de59af

rework a bit stringifyByChunk

30360ac

Rework ZipObject

4d4870e

Simplify a bit the generate() final conversion

71861e0

refactor : move the generate code to generate.js

6fa2c9c

The separation between the generation of the CompressedObjects and the actual use of them will help for the next commit, adding generateAsync().

Move logic into fileAdd.

394712e

The "content is empty" check can be done at the insertion of the content and not when compressing it.

replace the crc32 implementation with pako's

d8ab178

Update the Saucelabs dependencies / configuration

570ed22

This commit updates the saucelabs dependencies to the latest and update the configuration : - wait up to 10minutes, IE 6 is slow - launch max 3 parallel jobs - add IE6, IE11, safari 7, safari 8

Update browserify

0efa846

Grunt-browserify adde in v3.2.0 a `banner` option, use it. Also ignore the lib/nodejs/* files : they only make sense with streams.

This was referenced Feb 17, 2015

Async methods #141

Closed

Question: Progress Update #175

Closed

Fix the nodejs bridge.

b4a8b2b

When paused, a nodejs stream CAN and WILL emit "end" events.

edi9999 mentioned this pull request Apr 1, 2015

Speed improvements for big json with many nested levels (my json is 500 000 character) open-xml-templating/docxtemplater#131

Closed

dduponchel added 2 commits June 2, 2015 20:04

[breaking change] Remove sync methods, part 2.

63635e1

This commit remove the sync "load()" method. It has been replaced by "loadAsync()" which returns a Promise.

dduponchel added 2 commits July 16, 2015 22:00

Merge tag 'v2.5.0' into async

b9aca03

JSZip 2.5.0

Use setImmediate instead of setTimeout.

954791e

setTimeout has 4ms minimum delay while on a recent browser a chunk takes ~0.2ms of processing. removing these 4ms make a huge difference. Main downside: the polyfill globally declares setImmediate.

dduponchel mentioned this pull request Jul 18, 2015

What to put in JSZip v3 ? #224

Closed

dduponchel added 3 commits August 2, 2015 11:21

Add "text" as an alias of "string".

944d981

When I migrated the tests from `asText()` to `async("string")`, I typed a lot of `async("text")`. I strongly suspect that other users will have the issue too while migrating.

Fix upgrade_guide example.

aa035eb

Stuk reviewed Aug 5, 2015
View reviewed changes

dduponchel added 2 commits August 7, 2015 07:44

Extract ZipFileWorker in its own file.

b84be38

Split the generate.js file into generate/index.js and generate/ZipFileWorker.js.

Workers: propagate errors upward with ".error()".

058d391

The error event wasn't sent upward, the parent workers were left paused (or running). Also make ".pause()" and ".resume()" return boolean, extending the default behavior is easier with it.

dduponchel mentioned this pull request Aug 16, 2015

zipping 2 million small files #227

Open

Replace setimmediate2 with asap.

c286722

The module `asap` doesn't update the global scope. The only downside: we can't cancel the async call anymore. Instead, we now let the async call happen and return early if the stream is paused or finished.

Stuk added a commit that referenced this pull request Sep 13, 2015

Merge pull request #195 from dduponchel/async

c1260c8

async / stream support

Stuk merged commit c1260c8 into Stuk:jszip_v3 Sep 13, 2015

bitinn mentioned this pull request Oct 18, 2015

Use case for res.arrayBuffer node-fetch/node-fetch#51

Closed

dduponchel mentioned this pull request Feb 10, 2016

Async support #121

Closed

dduponchel deleted the async branch March 24, 2016 22:13

dduponchel mentioned this pull request Apr 12, 2016

Release 3.0.0 #278

Merged

takahirox mentioned this pull request Sep 30, 2019

3MFLoader needs async handling of JSZip mrdoob/three.js#11583

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

async / stream support #195

async / stream support #195

dduponchel commented Jan 8, 2015

dduponchel commented Apr 6, 2015

andrewvarga commented May 18, 2015

dduponchel commented May 26, 2015

andrewvarga commented May 26, 2015

dduponchel commented Jun 2, 2015

balaclark commented Jul 11, 2015

dduponchel commented Jul 18, 2015

dduponchel commented Aug 2, 2015

Stuk Aug 5, 2015

dduponchel Aug 6, 2015

dduponchel Sep 2, 2015

Stuk commented Sep 12, 2015

dduponchel commented Sep 13, 2015

0cv commented Nov 13, 2015

async / stream support #195

async / stream support #195

Conversation

dduponchel commented Jan 8, 2015

Workers

Public API

Nodejs stream support

Breaking changes

dduponchel commented Apr 6, 2015

andrewvarga commented May 18, 2015

dduponchel commented May 26, 2015

andrewvarga commented May 26, 2015

dduponchel commented Jun 2, 2015

balaclark commented Jul 11, 2015

dduponchel commented Jul 18, 2015

dduponchel commented Aug 2, 2015

Stuk Aug 5, 2015

Choose a reason for hiding this comment

dduponchel Aug 6, 2015

Choose a reason for hiding this comment

dduponchel Sep 2, 2015

Choose a reason for hiding this comment

Stuk commented Sep 12, 2015

dduponchel commented Sep 13, 2015

0cv commented Nov 13, 2015