Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to unzip archives created not by JSZip #250

Closed
nikchuk opened this issue Jan 12, 2016 · 13 comments
Closed

unable to unzip archives created not by JSZip #250

nikchuk opened this issue Jan 12, 2016 · 13 comments

Comments

@nikchuk
Copy link

nikchuk commented Jan 12, 2016

Hi. In my case JSZip unable to unpack zip created not by JSZip (e.g. by native windows zipper or by 7z with deflate). Exception is "End of data reached (data length = 4042, asked index = 4168). Corrupted zip ?" What can be wrong? Thanks in advance...

@dduponchel
Copy link
Collaborator

How do you get the content of the zip file ? An ajax request in a browser ?
An ajax request, if not prepared correctly, will try to decode the binary content as a text and corrupt it (see this page).
If this doesn't solve your issue, we will need the code that get the content / use JSZip and the zip file that fails.

@nikchuk
Copy link
Author

nikchuk commented Jan 12, 2016

No, it is not an ajax request in a browser. It is non-browser script engine in client application.

Looks like the mentioned exception happens if there are non-text files in zip (e.g. png):
ZipFailed.zip

One more issue. If there is only text file packed by other tool, zip is loaded but the list of files in JSZip object is empty:
ZipEmpty.zip

If the same text file is packed by JSZip, it is loaded correctly and JSZip files shows correct list:
ZipOK.zip

Here is how I get content of zip:

var ZIP = new JSZip();
//...
var objBinaryFile = new ActiveXObject("ADODB.Stream");
objBinaryFile.type = 2;
objBinaryFile.charset = "utf-8";
objBinaryFile.Open();
objBinaryFile.LoadFromFile(zipPath);

var content = objBinaryFile.ReadText(); // content for JSZip
ZIP.load(content);
//...

@dduponchel
Copy link
Collaborator

I tested the three zip files on my machine with JSZip (loaded with nodejs REPL) and zipinfo (a command line utility):

  • ZipFailed.zip is opened by both, they show the Test.xml / TestImg.png files
  • ZipEmpty.zip is opened by both, they show the Test.xml file
  • ZipOK.zip is not opened by JSZip (End of data reached (data length = 328, asked index = 44482). Corrupted zip ?) and is not opened by zipinfo (error [ZipOK.zip]: missing 44231 bytes in zipfile)

Same result with other tools.

I don't know how to test this code on my side but I strongly suspect that the ADODB.Stream doesn't give you the exact binary content. The Read method looks promising but from this question on StackOverflow, setting the type parameter to 2 (adTypeText) is ok...

Could you test:

  • objBinaryFile.type = 1 and objBinaryFile.Read() ?
  • objBinaryFile.type = 2, objBinaryFile.charset = "windows-1251" and objBinaryFile.ReadText() ?

I never used ADODB.Stream so these are guesses based on what I saw on the internet.

@nikchuk
Copy link
Author

nikchuk commented Jan 13, 2016

Strange... Btw, ZipOK.zip is also can not be opened by other tools in my case. They do not recognize the file as zip archive. But JSZip is still can handle it oppositely to you...

Concerning the ADODB.Stream:

  • If objBinaryFile.type = 1 and objBinaryFile.Read() it returns content as Variant (Array of Byte) which can not be handled by JSZip.
  • If objBinaryFile.type = 2, objBinaryFile.charset = "windows-1251" and objBinaryFile.ReadText(), JSZip is able load the content, but content of zipped files can not be got by any method (asText(), asBinary()...) with exception "'zipComment.length' is null or not an object":
    30524_HandlingTrace_0000000001.zip

@nikchuk
Copy link
Author

nikchuk commented Jan 13, 2016

So, I did some more tryings...

//...
// sourceName, sourceContent and zipPath declared and defined somewhere above

zip.file(sourceName, sourceContent); // create file
var content = zip.generate({type:"string", compression: "DEFLATE", compressionOptions : {level:6}}); // generate zip

var objBinaryFile = new ActiveXObject("ADODB.Stream"); // save zip as binary file
objBinaryFile.type = 2;
objBinaryFile.charset = "iso-8859-1"; 
objBinaryFile.open(); 
objBinaryFile.writeText(content);
objBinaryFile.position = 0;
objBinaryFile.saveToFile(zipPath, 2); 
objBinaryFile.close(); 
objBinaryFile = null;
  • Reading of such zip by JSZip:
var objBinaryFile = new ActiveXObject("ADODB.Stream"); 
objBinaryFile.type = 2;
objBinaryFile.charset = "iso-8859-1"; 
objBinaryFile.open(); 
objBinaryFile.loadFromFile(zipPath);
objBinaryFile.Position = 0;
var content = objBinaryFile.ReadText();
objBinaryFile.close(); 
objBinaryFile = null;

zip.load(content);

zip.files has the list of files now. But access to individual file content via asText() throw exception in function inflate(input, options):

// That will never happens, if you don't cheat with options :)
if (inflator.err) { throw inflator.msg; }
// inflator.msg is "invalid code lengths set"

It looks for me like problem with charsets...
If create and read zip with objBinaryFile.charset = "utf-8" JSZip is able to get file content via asText() but any other tools can not handle this zip. The example is ZipOK.zip from previous comments...

@dduponchel
Copy link
Collaborator

Ideally, you shouldn't have any charset issue as you handle binary data. A charset is only useful when you transform text to and from binary data, that's why the ReadText method looks suspicious. If this method corrupts the data, we should try Read.

JSZip doesn't know how to handle a Variant, but you could try to convert it to an array of integers (<= 255). That should be a simple loop to iterate and copy all values in an array (but I can't find any example actually using a Variant).
Then, you can give this array to zip.load(array, {checkCRC32: true}).

@nikchuk
Copy link
Author

nikchuk commented Jan 14, 2016

I found the tricky way to convert this variant to a normal JS array where each item is integer <= 255... But when I try to load it with zip.load(array, {checkCRC32: true}) JSZip recognizes it as uint8array which is not supported in my case (not a browser). As result exception "uint8array is not supported by this browser" occurs.

@nikchuk
Copy link
Author

nikchuk commented Jan 14, 2016

As long as pako supports regular JS arrays may be it make sense to support it in JSZip. It would add there one more usecase for such specific applications.

dduponchel added a commit to dduponchel/jszip that referenced this issue Jan 14, 2016
In Stuk#250 case, we don't have fancy Uint8Array but we have a unsupported
binary format. An array of bytes (numbers between 0 and 255) is the
lowest common denominator. A binary string would awkward to build here
and building it reliably can be tricky (without filling the stack or
taking too much time/memory).
This commit adds the missing `arrayReader` needed here.
@dduponchel
Copy link
Collaborator

As result exception "uint8array is not supported by this browser" occurs.

Sorry, I forgot that the fallback was the Uint8ArrayReader. I tested the code in nodejs REPL... which supports Uint8Array.

As long as pako supports regular JS arrays may be it make sense to support it in JSZip. It would add there one more usecase for such specific applications.

We already use pako's support of arrays on platforms that don't support Uint8Arrays. We don't support arrays in the load method because we never needed to :)

Could you check if it works with this branch (I built the dist files here) ?

Out of curiosity, how do you read a Variant object ?

@nikchuk
Copy link
Author

nikchuk commented Jan 14, 2016

Thanks, I will check it tomorrow and give you feedback then.
Concerning the reading of Variant. I found an idea and general implementation at some forum and extended it a bit. It works just like that:

var bogusWindows1252chars = "\u20AC\u201A\u0192\u201E\u2026\u2020\u2021" +
    "\u02C6\u2030\u0160\u2039\u0152\u017D" +
    "\u2018\u2019\u201C\u201D\u2022\u2013\u2014" +
    "\u02DC\u2122\u0161\u203A\u0153\u017E\u0178";
var correctLatin1chars = "\u0080\u0082\u0083\u0084\u0085\u0086\u0087" +
    "\u0088\u0089\u008A\u008B\u008C\u008E" +
    "\u0091\u0092\u0093\u0094\u0095\u0096\u0097" +
    "\u0098\u0099\u009A\u009B\u009C\u009E\u009F";

// This turns a string read as codepage 1252 into a boxed string with a
// byteAt method.  We also modify the slice method to return a similar object.
function binaryString(str)
{
    var r = str ? new String(str) : new String();     // always return an object with a .length
    r.byteAt = function(index)
    {
        // translate character back to originating Windows-1252 byte value
        if (this.charCodeAt(index) <= 255)
            return this.charCodeAt(index);

        var p = bogusWindows1252chars.indexOf(this.charAt(index));
        return correctLatin1chars.charCodeAt(p);
    };
    r.slice  = function(start, end)
    {
        return binaryString(this.substring(start, end));
    };
    return r;
}

// Does reverse translation from bytes back to Windows-1252 characters.  You can
// build up a string to write back to disk by concatenating a bunch of these.
function fromByte(num)
{
    var c = String.fromCharCode(num);
    var p = correctLatin1chars.indexOf(c);
    return p >= 0 ? bogusWindows1252chars.charAt(p) : c;
}

var binstream = new ActiveXObject("ADODB.Stream");
binstream.Type = 2 /*adTypeText*/;
binstream.Charset = "iso-8859-1";   // actually Windows codepage 1252
binstream.Open();
binstream.LoadFromFile(zipPath);

var content = binaryString(binstream.ReadText());
binstream.Close();
binstream = null;

var arr = [];
for(var i = 0; i < content.length; i++)
{
    arr.push(content.byteAt(i));
}

// arr repeats content of Variant which can be read as
// var binstream = new ActiveXObject("ADODB.Stream");
// binstream.Type = 1;
// binstream.Open();
// binstream.LoadFromFile(zipPath);
//var variant = binstream.Read();

So, I assume if there is Array of Byte Variant received from somewhere it can be written into binary Stream and then converted in the similar way into array.

@nikchuk
Copy link
Author

nikchuk commented Jan 15, 2016

Thank you for support! It works fine now.

ZipFailed.zip is opened by both, they show the Test.xml / TestImg.png files
ZipEmpty.zip is opened by both, they show the Test.xml file
ZipOK.zip is not opened by JSZip (End of data reached (data length = 328, asked index = 44482). Corrupted zip ?) and is not opened by zipinfo (error [ZipOK.zip]: missing 44231 bytes in zipfile)

I have the same result now.

@dduponchel
Copy link
Collaborator

Cool ! I'll create a pull request with this fix.

For the variant transformation, I though it would be something like

var variant = binstream.Read();
var result = new Array(variant.Length);
for (var i = 0; i < variant.Length; i++) {
  result[i] = variant[i]; // or getByte(i) ? byteAt(i) ? I don't know how to get values
}
new JSZip(result);

I was wrong :)

@dduponchel
Copy link
Collaborator

Released in v2.6.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants