Improve performances #56

dduponchel · 2013-06-30T15:21:39Z

The aim of this pull request is to reduce the CPU/memory consumption of
JSZip. The main idea is : don't transform the data if it's not
necessary.

Lazy decompress

The main new feature is the lazy decompression. If the user loads a zip
file but only read a single entry, we don't need to INFLATE the other
entries. Moreover, if the user generate() this JSZip object, we
won't DEFLATE every files : we will reuse the compressed data and only
recompress the read entry. This is the goal of JSZip.CompressedObject.
This unfortunately means that we won't be backward compatible : the data
attribute may not be calculated yet.

Don't transform the data until necessary

An other change is the type of ZipObject.data (renamed _data for the
reason above). Instead of transforming it into a string when
loading/adding an ArrayBuffer, we let it be. We will transform it on
demand (getters, generate), not before. The central part of this change
is the JSZip.utils.transformTo function. This adds a big matrix
(transform) to transform any supported type into any other. I
think this is worth the trouble : the other parts of JSZip just need to
know the destination type. This also means that nodejs support comes
nearly from free : the transformTo method will take care of a lot of
things.

Update the INFLATE/DEFLATE implementation

The current implementation (from Masanao Izumo in 1999) is known to have
bugs (issues #22, #26, #29, #43, #52, #53). The new implementation is
from https://github.com/imaya/zlib.js and don't have these bugs.
This implementation is slower than the current one, but this one works.

Other possible improvements

The ArrayBuffer -> unicode string transformation is not efficient. On
Firefox, the TextDecoder API solves this but this is not (yet) available
on the other browsers / nodejs. A solution is to use the BlobReader
API but that means transforming all our API into an asynchronous one.

This commit aims to be as lazy as possible : When reading a file, we now don't decompress the content, we just keep a reference to the original compressed file and an offset. If the user accesses a file, we will decompress it and replace the content (so we don't have to decompress it again). When generating a zip, if a file has not been decompressed we check if we can reuse the compressed content. This unfortunately means that we won't be backward compatible : the data attribute may not be calculated yet. Worse, the data now can be a string, an array or a UInt8Array. The user must use the getters ! The interface for compression/decompression has also changed : we now specify the input type for each operation. This has been tested in IE 6 -> 10, firefox, chrome, opera. If anyone has an apple product with safari, he's welcome to test :)

Use https://github.com/imaya/zlib.js instead of an old implementation.

Add the optimizedBinaryString option and hints about performances.

str += 'char' is faster than array.push && array.join but the difference is not big. String are immutable so the concatenation will create n(n-1)/2 objects, so a memory consumption in O(n^2). The array join is in O(n). When working with large files (hundreds of Mb), O(n^2) is clearly not a good idea. Also, use TextDecoder if available to boost perfs.

Working with strings consumes a lot of resources. For example, the transformation utf8 string -> binary string is faster with the path utf8 string -> Uint8Array (via TextEncoder) -> binary string.

See http://jsperf.com/array-direct-assignment-vs-push/31, direct assignment is faster than push.

Stuk · 2013-06-30T23:35:54Z

Looks great! 👍 Feel free to merge.

To follow semver, because of the breaking changes this should be released as 2.0.0. Does that sound ok?

dduponchel · 2013-07-01T19:10:40Z

Thanks !
I was thinking of a v2 too to follow semver. Before any release, I have an other pull request to do (which depends heavily on this one) : full nodejs support :)

Improve performances

dduponchel added 9 commits May 9, 2013 22:01

Replace the INFLATE/DEFLATE implementations

3754a98

Use https://github.com/imaya/zlib.js instead of an old implementation.

Update documentation

9656c5c

Add the optimizedBinaryString option and hints about performances.

Tests : better display of unexpected errors.

6c75955

inflate/deflate : v0.1.6

7f6387f

remove an old file

54dcb7e

Avoid if possible strings manipulation.

a98f7ab

Working with strings consumes a lot of resources. For example, the transformation utf8 string -> binary string is faster with the path utf8 string -> Uint8Array (via TextEncoder) -> binary string.

performances : small gains on chrome

14217a0

See http://jsperf.com/array-direct-assignment-vs-push/31, direct assignment is faster than push.

dduponchel added a commit that referenced this pull request Jul 1, 2013

Merge pull request #56 from dduponchel/WIP_perfs

47f2c3f

Improve performances

dduponchel merged commit 47f2c3f into Stuk:master Jul 1, 2013

dduponchel deleted the WIP_perfs branch July 1, 2013 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performances #56

Improve performances #56

dduponchel commented Jun 30, 2013

Stuk commented Jun 30, 2013

dduponchel commented Jul 1, 2013

Improve performances #56

Improve performances #56

Conversation

dduponchel commented Jun 30, 2013

Lazy decompress

Don't transform the data until necessary

Update the INFLATE/DEFLATE implementation

Other possible improvements

Stuk commented Jun 30, 2013

dduponchel commented Jul 1, 2013