Improve performances #56

Merged
merged 9 commits into from Jul 1, 2013

Projects

None yet

2 participants

@dduponchel
Collaborator

The aim of this pull request is to reduce the CPU/memory consumption of
JSZip. The main idea is : don't transform the data if it's not
necessary.

Lazy decompress

The main new feature is the lazy decompression. If the user loads a zip
file but only read a single entry, we don't need to INFLATE the other
entries. Moreover, if the user generate() this JSZip object, we
won't DEFLATE every files : we will reuse the compressed data and only
recompress the read entry. This is the goal of JSZip.CompressedObject.
This unfortunately means that we won't be backward compatible : the data
attribute may not be calculated yet.

Don't transform the data until necessary

An other change is the type of ZipObject.data (renamed _data for the
reason above). Instead of transforming it into a string when
loading/adding an ArrayBuffer, we let it be. We will transform it on
demand (getters, generate), not before. The central part of this change
is the JSZip.utils.transformTo function. This adds a big matrix
(transform) to transform any supported type into any other. I
think this is worth the trouble : the other parts of JSZip just need to
know the destination type. This also means that nodejs support comes
nearly from free : the transformTo method will take care of a lot of
things.

Update the INFLATE/DEFLATE implementation

The current implementation (from Masanao Izumo in 1999) is known to have
bugs (issues #22, #26, #29, #43, #52, #53). The new implementation is
from https://github.com/imaya/zlib.js and don't have these bugs.
This implementation is slower than the current one, but this one works.

Other possible improvements

The ArrayBuffer -> unicode string transformation is not efficient. On
Firefox, the TextDecoder API solves this but this is not (yet) available
on the other browsers / nodejs. A solution is to use the BlobReader
API but that means transforming all our API into an asynchronous one.

dduponchel added some commits May 9, 2013
@dduponchel dduponchel Lazy decompress data
This commit aims to be as lazy as possible :

When reading a file, we now don't decompress the content, we just keep
a reference to the original compressed file and an offset.

If the user accesses a file, we will decompress it and replace the
content (so we don't have to decompress it again).

When generating a zip, if a file has not been decompressed we check if
we can reuse the compressed content.

This unfortunately means that we won't be backward compatible : the
data attribute may not be calculated yet. Worse, the data now can be a
string, an array or a UInt8Array. The user must use the getters !

The interface for compression/decompression has also changed : we now
specify the input type for each operation.

This has been tested in IE 6 -> 10, firefox, chrome, opera.
If anyone has an apple product with safari, he's welcome to test :)
ea2aaf5
@dduponchel dduponchel Replace the INFLATE/DEFLATE implementations
Use https://github.com/imaya/zlib.js instead of an old implementation.
3754a98
@dduponchel dduponchel Update documentation
Add the optimizedBinaryString option and hints about performances.
9656c5c
@dduponchel dduponchel Tests : better display of unexpected errors. 6c75955
@dduponchel dduponchel optimize memory consumption instead of cpu consumption
str += 'char' is faster than array.push && array.join but the difference
is not big.
String are immutable so the concatenation will create n(n-1)/2 objects,
so a memory consumption in O(n^2). The array join is in O(n).
When working with large files (hundreds of Mb), O(n^2) is clearly not a
good idea.

Also, use TextDecoder if available to boost perfs.
b84e00d
@dduponchel dduponchel inflate/deflate : v0.1.6 7f6387f
@dduponchel dduponchel remove an old file 54dcb7e
@dduponchel dduponchel Avoid if possible strings manipulation.
Working with strings consumes a lot of resources. For example, the
transformation utf8 string -> binary string is faster with the path
utf8 string -> Uint8Array (via TextEncoder) -> binary string.
a98f7ab
@dduponchel dduponchel performances : small gains on chrome
See http://jsperf.com/array-direct-assignment-vs-push/31, direct
assignment is faster than push.
14217a0
@Stuk
Owner
Stuk commented Jun 30, 2013

Looks great! 👍 Feel free to merge.

To follow semver, because of the breaking changes this should be released as 2.0.0. Does that sound ok?

@dduponchel
Collaborator

Thanks !
I was thinking of a v2 too to follow semver. Before any release, I have an other pull request to do (which depends heavily on this one) : full nodejs support :)

@dduponchel dduponchel merged commit 47f2c3f into Stuk:master Jul 1, 2013
@dduponchel dduponchel deleted the dduponchel:WIP_perfs branch Jul 1, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment