New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async / stream support #195

Merged
merged 33 commits into from Sep 13, 2015

Conversation

Projects
None yet
5 participants
@dduponchel
Collaborator

dduponchel commented Jan 8, 2015

This pull request replaces #141.
The current API is synchronous : if JSZip takes too much time to finish its task, the page crashes
(it freezes during the task anyway). This commit does a the following :

  • rewrite the code into workers which are asynchronous
  • replace the public methods
  • add nodejs stream support
  • break the compatibility with existing code

Note : I created a branch jszip_v3 which I target : a change this big can't go
to the current v2.

Workers

A worker is like a nodejs stream but with some differences. On the good side :

  • it works on IE 6-9 without any issue / polyfill
  • it weights less than the full dependencies bundled with browserify
  • it forwards errors (no need to declare an error handler EVERYWHERE)

On the bad side :
It adds new code to maintain, fix, optimize.

A chunk is an object with 2 attributes : meta and data. The former is an
object containing anything (percent for example), see each worker for more
details. The latter is the real data (String, Uint8Array, etc).

Public API

I've updated my gh-pages with the latest API documentation.

Getters are replaced by nodeStream(type) and async(type), generate() is replaced by generateNodeStream and generateAsync. The "async" methods return a Promise, the "stream" methods return a Stream3 (from nodejs) if available.
They have an update callback to get access to metadata like the
name of the current file being compressed or the operation progress (with percent).

With this pull request, the API will be :

// generate
zip.generateAsync(...) // return a Promise of the content
zip.generateNodeStream(...) // return the node stream
zip.generateInternalStream(...) // return the internal stream to let the user write glue code

// getters
zip.async("arraybuffer") // return a Promise
zip.nodeStream("arraybuffer") // return a nodejs stream
zip.internalStream(...) // return the internal stream to let the user write glue code

The previous sync methods now throw exceptions.

Nodejs stream support

file(name, data) now accepts a nodejs stream as data. The "nodeStream" methods generates streams.

Breaking changes

The undocumented JSZip.compressions object changed : the object now returns
workers to do the job, the previous methods are not used anymore.

A lot of sync methods has been removed.

dduponchel added some commits Feb 8, 2014

allow the JIT compilation of JSZip.base64
The index went too far, causing the optimizer to drop the compiled code.
function arrayLikeToString : allow JIT
Because of the try/catch, the JIT compiler can't do its magic.
This patch moves some of the hot code into separate functions : those
functions can be compiled, the try/catch remains in the main loop.
unicode path/comment : skip a step
When loading a zip file with unicode path/comment, we were combining two costly
operations : read data as (binary) string and then convert the binary string
to a real string. This patch skip the first string conversion.
optimize the call to file(name)
Optimize the call to file(name) : instead of using filter (in O(n)),
fetch the entry directly (in O(1)).
extract CompressedObject and simplify its code
The previous code used a closure to keep a reference to the compressed
content and pointers to the beginning/end of the file in this content.
Then we had some complicated code to extract the content and handle the
corner cases.
This new version adds two new ("private" with a _) methods to the
ZipObject instances : _compress and _decompress. A ZipObject instance
will have two states : compressed (with a CompressedObject) and
decompressed (with a string, uint8array, etc).
With this patch, no more closure : for each file we create a
sub{string,array} for the data and then create a CompressedObject.
read signature : don't call readString if possible
The transformation uint8array -> string (to check against the signature)
can be costly if repeated for a lot of entries. This commit uses the
reader to check the signature, speeding things with a Uint8ArrayReader.
refactor : move the generate code to generate.js
The separation between the generation of the CompressedObjects and the
actual use of them will help for the next commit, adding generateAsync().
Move logic into fileAdd.
The "content is empty" check can be done at the insertion of the content
and not when compressing it.
Rewrite code into workers
This commit addresses the timeout issue. The current API is synchronous : if
JSZip takes too much time to finish its task, the page crashes
(it freezes during the task anyway). This commit does a the following :

- rewrite the code into workers which can be asynchronous
- add the needed public methods
- add nodejs stream support
- break the compatibility with existing code

Workers
-------

A worker is like a nodejs stream but with some differences. On the good side :

- it works on IE 6-9 without any issue / polyfill
- it weights less than the full dependencies bundled with browserify
- it forwards errors (no need to declare an error handler EVERYWHERE)

On the bad side :
To get sync AND async methods on the public API without duplicating a lot of
code, this class has `isSync` attribute and some if/then to choose between
doing stuff now, or using an async callback. It is dangerously close to
releasing Zalgo (see http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony for more).

A chunk is an object with 2 attributes : `meta` and `data`. The former is an
object containing anything (`percent` for example), see each worker for more
details. The latter is the real data (String, Uint8Array, etc).

Public API
----------

Each method generating data (generate, asText, etc) gain a stream sibling :
generateStream, asTextStream, etc.
This will need a solid discussion because I'm not really satified with this.

Nodejs stream support
---------------------

With this commit, `file(name, data)` accepts a nodejs stream as data. It also
adds a `asNodejsStream()` on the StreamHelper.

Breaking changes
----------------

The undocumented JSZip.compressions object changed : the object now returns
workers to do the job, the previous methods are not used anymore.

Not broken yet, but the the `checkCRC32` (when loading a zip file, it
synchronously check the crc32 of every files) will need to be replaced.
Update the Saucelabs dependencies / configuration
This commit updates the saucelabs dependencies to the latest and update
the configuration :

- wait up to 10minutes, IE 6 is slow
- launch max 3 parallel jobs
- add IE6, IE11, safari 7, safari 8
Update browserify
Grunt-browserify adde in v3.2.0 a `banner` option, use it. Also ignore
the lib/nodejs/* files : they only make sense with streams.

This was referenced Feb 17, 2015

Fix the nodejs bridge.
When paused, a nodejs stream CAN and WILL emit "end" events.
@dduponchel

This comment has been minimized.

Show comment
Hide comment
@dduponchel

dduponchel Apr 6, 2015

Collaborator

Keeping the sync methods will be too much troubles, I'll remove them.

That would give

zip.file("content.txt").async("uint8array", function onComplete(){...});
zip.file("content.txt").async({
    type: "uint8array",
    onUpdate: function () {...} // allow progress update notifications
}, function onComplete(){...});

zip.file("content.txt").stream("uint8array").on("data", ...); // our simple stream
zip.file("content.txt").nodeStream("uint8array").on("data", ...); // nodejs stream

zip.generateAsync({...}, function onComplete(){...});
zip.generateStream({...}).on("data", ...);
zip.generateNodeStream({...}).on("data", ...);

The old methods will throw an exception like "This method has been removed in JSZip 3.0, please check the upgrade guide.".

I'll remove the synchronous .load() method too : a sync checkCRC32 option isn't possible anymore and an async method allows more things (loading from a Blob, from a nodejs file descriptor, etc).

For the async operations, ES6 Promises look like the right choice (instead of manually adding a callback) but that means adding more bytes in the build.

@Stuk Does this sound reasonable ?

Collaborator

dduponchel commented Apr 6, 2015

Keeping the sync methods will be too much troubles, I'll remove them.

That would give

zip.file("content.txt").async("uint8array", function onComplete(){...});
zip.file("content.txt").async({
    type: "uint8array",
    onUpdate: function () {...} // allow progress update notifications
}, function onComplete(){...});

zip.file("content.txt").stream("uint8array").on("data", ...); // our simple stream
zip.file("content.txt").nodeStream("uint8array").on("data", ...); // nodejs stream

zip.generateAsync({...}, function onComplete(){...});
zip.generateStream({...}).on("data", ...);
zip.generateNodeStream({...}).on("data", ...);

The old methods will throw an exception like "This method has been removed in JSZip 3.0, please check the upgrade guide.".

I'll remove the synchronous .load() method too : a sync checkCRC32 option isn't possible anymore and an async method allows more things (loading from a Blob, from a nodejs file descriptor, etc).

For the async operations, ES6 Promises look like the right choice (instead of manually adding a callback) but that means adding more bytes in the build.

@Stuk Does this sound reasonable ?

@andrewvarga

This comment has been minimized.

Show comment
Hide comment
@andrewvarga

andrewvarga May 18, 2015

Is this going to be part of the stable release some time? (Or any other async (webworker) support?

andrewvarga commented May 18, 2015

Is this going to be part of the stable release some time? (Or any other async (webworker) support?

@dduponchel

This comment has been minimized.

Show comment
Hide comment
@dduponchel

dduponchel May 26, 2015

Collaborator

I'm still working on it... When I have enough time. I may have some time this weekend to continue working on the async load.

Collaborator

dduponchel commented May 26, 2015

I'm still working on it... When I have enough time. I may have some time this weekend to continue working on the async load.

@andrewvarga

This comment has been minimized.

Show comment
Hide comment
@andrewvarga

andrewvarga May 26, 2015

Let me know if I can help, I could actually make it work in a webworker really easily by just adding some wrapper methods on top of jszip.min.js like:

self.onmessage = function(message) {
    var messageData = message.data;
    var command = messageData.command;
    var args = messageData.args;
    var requestId = messageData.requestId;

    switch (command) {
        case "load":
                   jsZip = new JSZip();
            jsZip.load(args[0]);

            self.postMessage({
                requestId: requestId
            });
                       break;
        case "parse":
           // ... etc

This way I didn't need to touch the actual JSZIP code at all, it just allows to call its functions in a webworker.

andrewvarga commented May 26, 2015

Let me know if I can help, I could actually make it work in a webworker really easily by just adding some wrapper methods on top of jszip.min.js like:

self.onmessage = function(message) {
    var messageData = message.data;
    var command = messageData.command;
    var args = messageData.args;
    var requestId = messageData.requestId;

    switch (command) {
        case "load":
                   jsZip = new JSZip();
            jsZip.load(args[0]);

            self.postMessage({
                requestId: requestId
            });
                       break;
        case "parse":
           // ... etc

This way I didn't need to touch the actual JSZIP code at all, it just allows to call its functions in a webworker.

dduponchel added some commits Apr 5, 2015

[breaking change] Remove sync methods, part 1.
This commit removes most of the sync methods (everything but the load
method).
The new getters are :

- zip.async(type) : es6 promises
- zip.stream(type) : nodejs Stream3
- zip.internalStream(type) : our internal stream (useful to create wrappers).

The new generate methods are :
- zip.generateAsync() : es6 promises
- zip.generateStream() : nodejs Stream3
- zip.generateInternalStream() : our internal stream

With the "stream" keyword, everyone will think of the nodejs stream :
it's safer to use "internalStream" for our own.
The old sync versions will throw an exception.

Also fix the stream version we use : we use Stream3, not the stream
offered by the current node runtime.

Also changed in generate : prefer "binarystring" over "string".
That way, JSZip#generate will be consistent with ZipObject#async.
"string" is still supported because a zip as a text don't make sense.
[breaking change] Remove sync methods, part 2.
This commit remove the sync "load()" method. It has been replaced by
"loadAsync()" which returns a Promise.
@dduponchel

This comment has been minimized.

Show comment
Hide comment
@dduponchel

dduponchel Jun 2, 2015

Collaborator

I've updated the pull request with async methods and I've updated the description. I've updated my gh-pages with the latest API documentation.
The next step will be to merge this with master.

@andrewvarga doesn't importScripts solve this issue ? Making JSZip Worker aware may give a strange API.

Collaborator

dduponchel commented Jun 2, 2015

I've updated the pull request with async methods and I've updated the description. I've updated my gh-pages with the latest API documentation.
The next step will be to merge this with master.

@andrewvarga doesn't importScripts solve this issue ? Making JSZip Worker aware may give a strange API.

@balaclark

This comment has been minimized.

Show comment
Hide comment
@balaclark

balaclark Jul 11, 2015

Looks great, async is very necessary for large files.

balaclark commented Jul 11, 2015

Looks great, async is very necessary for large files.

dduponchel added some commits Jul 6, 2015

Use setImmediate instead of setTimeout.
setTimeout has 4ms minimum delay while on a recent browser a chunk takes
~0.2ms of processing. removing these 4ms make a huge difference.

Main downside: the polyfill globally declares setImmediate.
@dduponchel

This comment has been minimized.

Show comment
Hide comment
@dduponchel

dduponchel Jul 18, 2015

Collaborator

Up-to-date with v2.5.0 (I moved the branch jszip_v3 to this tag). @Stuk are you ok with this pull request ?

Collaborator

dduponchel commented Jul 18, 2015

Up-to-date with v2.5.0 (I moved the branch jszip_v3 to this tag). @Stuk are you ok with this pull request ?

dduponchel added some commits Aug 2, 2015

Prefix nodejs stream methods with "node".
Browsers are also adding streams, see [1], [2], [3]. While the spirit is
the same, the API is different from nodejs streams. To avoid future
confusion, I prefer renaming `generateStream` to `generateNodeStream`
and `stream` to `nodeStream` (we already have `nodebuffer` as type).

  [0]: https://streams.spec.whatwg.org/
  [1]: https://blog.wanderview.com/blog/2015/06/19/intent-to-implement-streams-in-firefox/
  [2]: https://www.chromestatus.com/features/5804334163951616
Add "text" as an alias of "string".
When I migrated the tests from `asText()` to `async("string")`, I typed
a lot of `async("text")`. I strongly suspect that other users will have
the issue too while migrating.
@dduponchel

This comment has been minimized.

Show comment
Hide comment
@dduponchel

dduponchel Aug 2, 2015

Collaborator

I've prefixed stream methods with "node": browsers are also adding streams, see 1, 2, 3. While the spirit is the same, the API is different from nodejs streams. To avoid future confusion, I prefer renaming generateStream to generateNodeStream and stream to nodeStream (we already have nodebuffer as type).

Collaborator

dduponchel commented Aug 2, 2015

I've prefixed stream methods with "node": browsers are also adding streams, see 1, 2, 3. While the spirit is the same, the API is different from nodejs streams. To avoid future confusion, I prefer renaming generateStream to generateNodeStream and stream to nodeStream (we already have nodebuffer as type).

Show outdated Hide outdated lib/generate.js
.on("data", function (chunk) {
self.push({
data: chunk,
meta : {

This comment has been minimized.

@Stuk

Stuk Aug 5, 2015

Owner

Is this correct? It will always be 0 percent?

@Stuk

Stuk Aug 5, 2015

Owner

Is this correct? It will always be 0 percent?

This comment has been minimized.

@dduponchel

dduponchel Aug 6, 2015

Collaborator

We don't know the total content length that will be emitted by the nodejs stream.
When generating a zip file from streams, the update function of generateAsync() will get abrupt changes in the total percent:

{ currentFile: 'dist/', percent: 0 }
{ currentFile: 'dist/jszip.js', percent: 33.333333333333336 }
[...]
{ currentFile: 'dist/jszip.js', percent: 33.333333333333336 }
{ currentFile: 'dist/jszip.min.js', percent: 66.66666666666667 }
[...]
{ currentFile: 'dist/jszip.min.js', percent: 66.66666666666667 }
{ currentFile: null, percent: 100 }
[...]
{ currentFile: null, percent: 100 }

We get a better "curve" if we have the full content in memory (nodejs Buffer, ArrayBuffer, etc):

{ currentFile: 'dist/', percent: 0 }
{ currentFile: 'dist/jszip.js', percent: 33.333333333333336 }
{ currentFile: 'dist/jszip.js', percent: 34.813265189986616 }
{ currentFile: 'dist/jszip.js', percent: 36.2931970466399 }
[...]
{ currentFile: 'dist/jszip.js', percent: 64.41190232305222 }
{ currentFile: 'dist/jszip.js', percent: 65.8918341797055 }
[...]
{ currentFile: 'dist/jszip.min.js', percent: 94.72415163097831 }
{ currentFile: 'dist/jszip.min.js', percent: 100 }
{ currentFile: null, percent: 100 }
[...]
{ currentFile: null, percent: 100 }

The result is not perfect with streams but it seems good enough.

@dduponchel

dduponchel Aug 6, 2015

Collaborator

We don't know the total content length that will be emitted by the nodejs stream.
When generating a zip file from streams, the update function of generateAsync() will get abrupt changes in the total percent:

{ currentFile: 'dist/', percent: 0 }
{ currentFile: 'dist/jszip.js', percent: 33.333333333333336 }
[...]
{ currentFile: 'dist/jszip.js', percent: 33.333333333333336 }
{ currentFile: 'dist/jszip.min.js', percent: 66.66666666666667 }
[...]
{ currentFile: 'dist/jszip.min.js', percent: 66.66666666666667 }
{ currentFile: null, percent: 100 }
[...]
{ currentFile: null, percent: 100 }

We get a better "curve" if we have the full content in memory (nodejs Buffer, ArrayBuffer, etc):

{ currentFile: 'dist/', percent: 0 }
{ currentFile: 'dist/jszip.js', percent: 33.333333333333336 }
{ currentFile: 'dist/jszip.js', percent: 34.813265189986616 }
{ currentFile: 'dist/jszip.js', percent: 36.2931970466399 }
[...]
{ currentFile: 'dist/jszip.js', percent: 64.41190232305222 }
{ currentFile: 'dist/jszip.js', percent: 65.8918341797055 }
[...]
{ currentFile: 'dist/jszip.min.js', percent: 94.72415163097831 }
{ currentFile: 'dist/jszip.min.js', percent: 100 }
{ currentFile: null, percent: 100 }
[...]
{ currentFile: null, percent: 100 }

The result is not perfect with streams but it seems good enough.

Show outdated Hide outdated lib/utils.js

dduponchel added some commits Aug 7, 2015

Extract ZipFileWorker in its own file.
Split the generate.js file into generate/index.js and
generate/ZipFileWorker.js.
Workers: propagate errors upward with ".error()".
The error event wasn't sent upward, the parent workers were left paused
(or running).
Also make ".pause()" and ".resume()" return boolean, extending the
default behavior is easier with it.
Replace setimmediate2 with asap.
The module `asap` doesn't update the global scope. The only downside: we
can't cancel the async call anymore. Instead, we now let the async call
happen and return early if the stream is paused or finished.
@Stuk

This comment has been minimized.

Show comment
Hide comment
@Stuk

Stuk Sep 12, 2015

Owner

👍 is this ready to me merged then?

Owner

Stuk commented Sep 12, 2015

👍 is this ready to me merged then?

@dduponchel

This comment has been minimized.

Show comment
Hide comment
@dduponchel

dduponchel Sep 13, 2015

Collaborator

@Stuk this pull request is ready, I'll continue on #224.

Collaborator

dduponchel commented Sep 13, 2015

@Stuk this pull request is ready, I'll continue on #224.

Stuk added a commit that referenced this pull request Sep 13, 2015

@Stuk Stuk merged commit c1260c8 into Stuk:jszip_v3 Sep 13, 2015

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@bitinn bitinn referenced this pull request Oct 18, 2015

Closed

Use case for res.arrayBuffer #51

@Krisa

This comment has been minimized.

Show comment
Hide comment
@Krisa

Krisa Nov 13, 2015

damned, this is amazing, seriously. v2.5 just failed and crashed after 1 minute or so on few hundreds files weighting between 10KB to 500KB each and this new version just did the whole job in half a second! Unless there are some bugs (which I have not found on my few basic tests), it would make sense it becomes the new official version.

Small info: grunt (which I had to use to produce the dist v3.0) is somehow outdated with its uglify dependency and throws an error. Updating uglify to the latest version simply fixed the problem.

Krisa commented Nov 13, 2015

damned, this is amazing, seriously. v2.5 just failed and crashed after 1 minute or so on few hundreds files weighting between 10KB to 500KB each and this new version just did the whole job in half a second! Unless there are some bugs (which I have not found on my few basic tests), it would make sense it becomes the new official version.

Small info: grunt (which I had to use to produce the dist v3.0) is somehow outdated with its uglify dependency and throws an error. Updating uglify to the latest version simply fixed the problem.

@dduponchel dduponchel referenced this pull request Feb 10, 2016

Closed

Async support #121

@dduponchel dduponchel deleted the dduponchel:async branch Mar 24, 2016

@dduponchel dduponchel referenced this pull request Apr 12, 2016

Merged

Release 3.0.0 #278

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment