RangeError - when adding just over 2,400 files #343

hashitha · 2016-08-23T13:44:05Z

I am getting RangeError when I add more than 2400 files, Total size of the final zip file would be around 1.3gb. Works fine for anything less than that.

 zip.generateAsync({ type: "blob"  }, function onUpdateCallback(metadata) {
                       console.log(metadata);  //This goes to 100% then RangeError
                    }
                    ).then(function (content) { 
                        saveAs(content,  "test.zip");
                      });

The text was updated successfully, but these errors were encountered:

dduponchel · 2016-08-23T18:47:27Z

To generate a 1.3GB zip file with generateAsync, JSZip currently needs 3 times this amount:

one time for the original, non compressed content
one time for the intermediary chunks
one time for the final zip file

The first one could be mitigated by fixing #290. The second one could be the easiest improvement. The third one is needed by generateAsync.

Could you describe your use case ?
What are these 2400 files ? Files generated client side or real files from a <input type="file" multiple /> ? Do you add them as a Blob, as an ArrayBuffer, as a String, etc ?

hashitha · 2016-08-24T00:05:45Z

Thanks for the info,

Basically we are writing a medical app that downloads patient CT, Ultrasound, X-Ray, MRI etc.
And some CTs and MRIs can have over 2k images (DICOM files). So when the user clicks download I am using JSZipUtils.getBinaryContent to download the individual images and the combine all those images to one zip file (without compression) using JSZip. This work really well until we hit this limit, then it throws exception at the very end just after onUpdateCallback returns 100%. I am adding them as blob.

We have written all our back-end code in aws lambda, and lambda has a limitation of 500mb disk space so i am unable to compress it from there and serve. Our only other option is to have a server running which I really don't won't to do after all the work we have done to go server-less.

dduponchel · 2016-08-24T20:42:27Z

Do you have a constraint on the browser ? Using generateInternalStream instead of generateAsync would remove a lot of memory pressure but needs a browser supporting web streams (only chrome right now). Then, with StreamSaver.js, save the stream. The stream API are not compatible, but it should be easy to convert them (disclaimer: I never tested this code):

var writeStream = streamSaver.createWriteStream('output.zip');

zip
.generateInternalStream({type:"uint8array"})
.on('data', function (data, metadata) {
    writeStream.write(data);
})
.on('error', function (e) {
    writeStream.abort(e);
})
.on('end', function () {
    writeStream.close();
});
.resume();

Without streams, I don't see any magical solution. I'm modifying the accumulate method to improve memory usage with blobs (which seem to be moved from the memory under pressure) but the results will depends on the browser implementation.

hashitha · 2016-08-25T01:25:35Z

I have tried it with the latest chrome and I keep getting "WritableStream is not defined", event using the examples of StreamSaver doesn't work.

I think our only other option is to zip part by part. e.g. every 1,000 images we create one zip file and give to user part1.zip, part2.zip etc. It is of an inconvenience to the user but I don't see any other way at this stage.

jimmywarting · 2016-08-25T10:26:07Z

I keep getting "WritableStream is not defined"

@hashitha you need to include the web-streams-polyfill to work

hashitha · 2016-08-25T13:25:50Z

Thanks @jimmywarting , I tried that now getting

StreamSaver.js:122 Something went wrong! TypeError: writeStream.write is not a function

after reading mitm I am now using 127.0.0.1 to test still not working.

This is the code I am using

<script src="lib/StreamSaver.js"></script>
<script src="lib/polyfill.min.js"></script>
<script type="text/javascript">
  var readable = new window.ReadableStream;
</script>

      var writeStream = streamSaver.createWriteStream('output.zip');
      zip.generateInternalStream({ type: "uint8array" })
        .on('data', function (data, metadata) {
          console.log(metadata);
          writeStream.write(data);
        })
        .on('error', function (e) {
          writeStream.abort(e);
        })
        .on('end', function () {
          writeStream.close();
        }).resume();

jimmywarting · 2016-08-25T14:48:23Z

Hmm, they have changed the spec since i wrote this... going to work on updating the docs

looks like you now have to call .getWriter()

So something along this:

var writeStream = streamSaver.createWriteStream('output.zip').getWriter();

jimmywarting · 2016-08-25T21:46:19Z

Updated the readme/example/dependencies with how you should deal with the new spec
You just do what i wrote above

var writeStream = streamSaver.createWriteStream('output.zip').getWriter();

hashitha · 2016-08-26T00:05:22Z

I have tried it and it seems to work except its very unstable. Browser tab crashes at least 50% of the time.

jimmywarting · 2016-08-26T00:57:17Z

Hmm, need to try it myself to figure out what is wrong. Do you have some sample code/files I can try out?

hashitha · 2016-08-26T01:27:03Z

This is a snippet similar to what I use, only different is that you need to specify the urls, I had to remove the links because they were patient files. You can add around 5 links and run it.

<script src="lib/StreamSaver.js"></script>
<script src="lib/polyfill.min.js"></script>
<script type="text/javascript">
  var readable = new window.ReadableStream;
</script>
<script src="lib/jszip.js"></script>
<script src="lib/jszip-utils.js"></script>
<script src="lib/FileSaver.js"></script>

 var zip = new JSZip();
  var urlList = []; // add the urls here
  var downloadCount = 0;
  var totalCount = urlList.length;
  urlList.forEach(function(url) {
      //download each file
      JSZipUtils.getBinaryContent(url, function(err, data) {
          if (err) {
              console.log(err);
          }
          zip.file(downloadCount + ".dcm", data, {
              binary: true
          });
          downloadCount++;
          if (downloadCount == totalCount) {
              /* old code
                            zip.generateAsync({
                                type: "blob"
                            }, function onUpdateCallback(metadata) {
                                //console.log(metadata);
                            }).then(function(content) {
                                // see FileSaver.js
                                saveAs(content, studyDetails.patientName + ".zip");
                            }); 
              */
              var writeStream = streamSaver.createWriteStream('output.zip').getWriter();
              zip.generateInternalStream({
                      type: "uint8array"
                  })
                  .on('data', function(data, metadata) {
                      //console.log(metadata);
                      writeStream.write(data);
                      /*
                                            if (metadata.percent == 100) {
                                                console.log("END");
                                                writeStream.close();
                                            }
                                            */
                  })
                  .on('error', function(e) {
                      console.log(e);
                      writeStream.abort(e);
                  })
                  .on('end', function() {
                      console.log("END");
                      writeStream.close();
                  }).resume(); 
          }
      });
  });

jimmywarting · 2016-08-26T12:47:40Z

This is what i played around with: https://jsfiddle.net/Lxwvco3u/
seems to work fine all the time

jimmywarting · 2016-08-26T14:12:07Z

When i added node stream + node buffer into the fiddle then i manage to get rid of the most memory and i was able to start saving the zip file right away and i could see the progress as i downloaded

https://jsfiddle.net/Lxwvco3u/1/

it would truly help if jszip supported Web streams (or at least the .getReader())

The key feature here is that you want to fetch the next binary file when jszip is ready to accept more data, the only way i was able to do that was with .rs._read() cuz that is the only stuff that says "hey i need more data"

Then the only thing you are going to keep in the memory then is the current file you are compressing

you pipe an ajax to jszip
jszip compresses it
jszip flush it to the streamSaver (on to the disk) dos releasing memory
the data from the ajax is garbage collected
and then you repeat the process for the rest of the files

Hence why i created #345 #346

hashitha · 2016-08-29T10:56:26Z

Your fiddle code works, but I can't seem to get it working with my urls

I am no javascript expert so most probably doing something wrong.

this is the code

function add2ZipPart(linksArray) { 
  let zip = new JSZip
  let urlList = ["https://qubs-test.s3.amazonaws.com/dummyfile.txt"] // add the urls here
  let i = 0

  //linksArray.forEach(function (image) {
   // urlList.push(image.url);
  //});

  let writeStream = streamSaver.createWriteStream('output.zip').getWriter()

  for (let url of urlList) {

   /*
    let rs = streamBrowserify.Readable()
    let reader

    rs._read = () => {
      let p = reader || (reader = fetch(url).then(res => res.body.getReader()))
      p.then(reader => reader.read().then(({ value, done }) =>
        rs.push(done ? null : new Buffer(value)))
      )
    }
*/

    fetch(url).then(res => {
        // Same as above but another way to get a readable stream
        zip.file('dummyfile.txt', res.body)
    })

    //zip.file(`${i++}.dcm`, rs)
  }

  zip.generateInternalStream({ type: "uint8array" })
    .on('data', data => writeStream.write(data))
    .on('error', err => console.error(err))
    .on('end', () => writeStream.close())
    .resume()

}

With the commented code it creates zip file with empty files inside and the other it creates zip with no files inside.

Also I am guessing in this "for loop", it downloading one url at a time and then adding to zip where as the code I posted earlier using JSZipUtils.getBinaryContent downloads urls asynchronously and adds to the zip. Only issue here is if it was to download 1 file at a time then it would take a very long time to download 2,500 files each around 300kb. I am saying this because last couple of days I wrote a Amazon Elastic Beanstalk Worker Tier Application to zip the files and upload to a s3 bucket which worked but it took around 15mins and when I changed the code to use a thread pool it finished in around 3 mins. I am still hoping to use this library so the use can start downloading straight away and not wait for a beanstalk app to finish processing.

jimmywarting · 2016-08-29T11:12:24Z

jszip don't support web streams yet... so you can't do this:

fetch(url).then(res => {
    // Same as above but another way to get a readable stream
    zip.file('dummyfile.txt', res.body)
})

This was but only a proposal i suggested for jszip to implement

jimmywarting · 2016-08-29T11:15:13Z

You can not call zip.generateInternalStream until you have added all files yet... (by calling zip.file('dummyfile.txt', data))

That is why i added a node stream that can add a pointer to all the entries so all header/metainfo gets created first

hashitha · 2016-08-29T11:24:12Z

Yea, I am only calling it when everything is added. Not sure if that's the correct way of doing it but it work.

if (downloadCount == totalCount) {

jimmywarting · 2016-08-29T11:52:35Z

Seems like you need some temporary client side storage to cache everything first.
But before going in to that you could try to download right away by just changing it a little bit

let rs = streamBrowserify.Readable()

// Start downloading right away, 
// Browser will make 6 continues download and the rest will be pending
let p = fetch(url).then(res => res.body.getReader())

rs._read = () => {
  p.then(reader => reader.read().then(({ value, done }) =>
    rs.push(done ? null : new Buffer(value)))
  )
}

I'm uncertain if it will have negative or positive effect on the download
Worst case scenario is that everything becomes pending and don't proceed since the ajax call don't get called in the right order

hashitha · 2016-08-29T11:56:33Z

Can I try this code now or do I have to wait till jszip supports web streams?

jimmywarting · 2016-08-29T12:17:56Z

My version with node-stream + node-buffer can be used right now. You can try it yourself

I tried that modification on my second fiddle And i didn't like what i saw. jszip didn't read them in the same order i added the ajax call but it kept on downloading (so it get cached somewhere on the disk i suppose)

the effect was: The saving part didn't happen right away since it didn't get the first piece first. And the progress was jumping from 87 mb downloaded - 122 mb downloaded, 155 mb downloaded
between dose it was like a long duration

It would be grate if you somehow could knew what order jszip would read the files - so you can make a ajax call in the same order

hashitha · 2016-08-29T12:27:41Z

Are you able to provide me with a snippet for node-stream + node-buffer maybe to download and zip these 2 urls
https://qubs-test.s3.amazonaws.com/dummyfile.txt
https://qubs-test.s3.amazonaws.com/dummyfile2.txt

jimmywarting · 2016-08-29T13:16:47Z

sure: https://jsfiddle.net/Lxwvco3u/2/

The highlighted code is what makes a web stream to a node stream
This time i added the modification to download it directly

Also added a CORS proxy since those file didn't have CORS headers

hashitha · 2016-08-29T13:41:13Z

Thanks @jimmywarting . Sorry forgot to add CORS to the test bucket.

I tried this code and it downloaded 2,400 files (1.2GB) in about 2 mins and all files seems to be intact inside the zip file.

Just 2 more questions I have.

Is there anyway we can get rid of the mitm.html, because just before downloading it opens another tab (mitm.html) for about 3 seconds and closes.
Currently this only works with latest chrome, Do you think they will add this to other browsers in future?

jimmywarting · 2016-08-29T19:45:22Z

If your site is running https then you won't notice the mitm. (A hidden iframe will be used instead) mitm.html is required by http in order to send a message channel to a service worker and service worker can only be registered if the top domain is secure. So the StreamSaver service is actually a part of github.io

The mitm can also be abstracted away all together but that means more configuration for developer to set it up in there own environment but requires that the site use https

Think Firefox are close to implement streams so it will come eventually. It's a whatwg standard so it's not a chrome specific thing

hashitha · 2016-08-29T23:25:35Z

Thanks, it should be okay then as our production site is https. Thanks again for all your help

jimmywarting · 2016-09-03T11:49:57Z

Think I would like to borrow this thread and reopen it if possible/necessary.

Together with StreamSaver, generateInternalStream and node streams i have been able to give data to jszip when required and save data when i receive some from jszip to get a good memory boost

That's grate in the beginning and the end but durning the time i seeded a file to jszip i got nothing back. I think it's cuz jszip tries to compress the file first?. This works okey for many smaller files

When running this jsfiddle: https://jsfiddle.net/Lxwvco3u/3/
I will see this in my console.log

(1876) reading chunk of file: 1
(1876) writing chunk
(1876) reading chunk of file: 2
(1876) writing chunk
(1876) reading chunk of file: 3
(1876) writing chunk

For this purpose compressing doesn't matter the file is not going to be upload/download or save any few extra bytes on the hard drive. This is about saving multiple files and folder from the browser since it's a lot easier to save one zip and decompress it rather then giving multiple save dialogs to same folder. So you want a fast read/write, compress/decompress as possible

So my question is more like: Can you turn of the compression and pipe the chunks directly to StreamSaver? So you will get more like this in the console log?

reading chunk of file: 1
writing chunk
reading chunk of file: 1
writing chunk
reading chunk of file: 1
writing chunk
reading chunk of file: 2
writing chunk
reading chunk of file: 2
writing chunk
reading chunk of file: 2
writing chunk
reading chunk of file: 3
writing chunk
reading chunk of file: 3
writing chunk
reading chunk of file: 3
writing chunk

Ideally i would like to maybe save a hole 4gb file + a readme or something

hashitha · 2016-09-03T12:19:49Z

On some computers, the download speed goes to 0 kbps after a few seconds where it looks like it stopped downloading but then after a minute or so the file has completed downloading. Maybe what you are suggesting is causing this as well?

jimmywarting · 2016-09-03T14:09:02Z

@hashitha, that is cuz you are not downloading the remote files in the same order as jszip wants to read them

hashitha · 2016-09-03T14:12:19Z

@jimmywarting is there any way around this? I am using the code snippet you wrote before

jimmywarting · 2016-09-03T15:02:22Z

Maybe someone from the developers can tell you what order they will be read in. Or maybe you can try to figure it out yourself by logging the filename in rs._read

rs._read = () => {
console.log(filename)
}

Then you should try to order the array of files to be downloaded to be in the same order
Otherwise you need to reserve one Ajax that's fetching the current file jszip is trying to read and the other 5 Ajax calls just randomly buffer up the memory not knowing when its going to be read

jimmywarting · 2016-09-03T15:13:03Z

My first fiddle https://jsfiddle.net/Lxwvco3u/1/ that only use 1 Ajax call at the time won't randomly buffer up the memory.
It will download the file jszip is currently trying to read. And send it to StreamSaver as it goes forward and you will see the download progressive getting bigger

But that will also be slower

jimmywarting · 2016-10-07T07:19:14Z

Since I found about about streamFiles option in generateAsync i got what i wanted and have now manage to make the most ram friendliest method possible https://jsfiddle.net/Lxwvco3u/5
So i'm okey with closing this issue now...

venkatmarepalli · 2019-07-25T22:22:18Z

I came across this thread while working on a similar problem. I am downloading from a number of urls, the solution works fine but the generateInternalStream works only after adding all the files to zip object. Can we download and stream at the same time instead of relying on system memory?

jat255 · 2020-03-19T21:48:13Z

@jimmywarting Thanks for your implementation example in the Fiddle. I was able to get it working for my use case in Chrome. Do you know why this wouldn't work in Firefox (since it seems like they implemented the stream API from what I can tell). I'm getting an error when trying to run the fiddle:

TypeError: streamSaver.writableStream is not a constructor

jimmywarting · 2020-03-19T22:49:32Z

Have kinda left jszip for my own streaming implementation now. That fiddle is also very old.

better streaming support, less ram usage. Made for downloading/saving

Also helped out to build https://github.com/transcend-io/conflux/ that have support for both reading and creating zip files using whatwg streams

jat255 · 2020-03-19T23:19:08Z

Have kinda left jszip for my own streaming implementation now. That fiddle is also very old.

https://github.com/jimmywarting/StreamSaver.js

https://jimmywarting.github.io/StreamSaver.js/examples/saving-multiple-files.html

better streaming support, less ram usage. Made for downloading/saving

Also helped out to build https://github.com/transcend-io/conflux/ that have support for both reading and creating zip files using whatwg streams

Fantastic, thank you! The example you gave here is more or less what I need to do (breaking up the file list into another zip stream if it will go over 4GB to get around that limitation).

Inderjeet96 · 2023-04-24T10:31:10Z

Have kinda left jszip for my own streaming implementation now. That fiddle is also very old.

https://github.com/jimmywarting/StreamSaver.js

https://jimmywarting.github.io/StreamSaver.js/examples/saving-multiple-files.html

better streaming support, less ram usage. Made for downloading/saving

Also helped out to build https://github.com/transcend-io/conflux/ that have support for both reading and creating zip files using whatwg streams

Hi @jimmywarting , I've used the code to download zip files "https://github.com/transcend-io/conflux/", is there any way I can get the progress bar how many files or data are zipped?

dduponchel added the performance label Aug 24, 2016

jimmywarting mentioned this issue Aug 26, 2016

Solution to writing large zip's in browsers #308

Closed

hashitha closed this as completed Aug 29, 2016

hashitha reopened this Sep 3, 2016

This was referenced Sep 24, 2016

zipping 2 million small files #227

Open

Can we observe zipping progress for file? #361

Open

jimmywarting mentioned this issue Feb 5, 2017

Generating huge files on client #394

Open

Jyang772 mentioned this issue Dec 16, 2017

Failed - Network Error #492

Open

jimmywarting mentioned this issue Sep 20, 2018

Feature/lazy blob reading #555

Open

jimmywarting mentioned this issue Aug 22, 2019

Zip64 jimmywarting/StreamSaver.js#114

Open

RangeError - when adding just over 2,400 files #343

RangeError - when adding just over 2,400 files #343

Comments

hashitha commented Aug 23, 2016

dduponchel commented Aug 23, 2016

hashitha commented Aug 24, 2016

dduponchel commented Aug 24, 2016

hashitha commented Aug 25, 2016

jimmywarting commented Aug 25, 2016

hashitha commented Aug 25, 2016

jimmywarting commented Aug 25, 2016 • edited Loading

jimmywarting commented Aug 25, 2016

hashitha commented Aug 26, 2016

jimmywarting commented Aug 26, 2016 • edited Loading

hashitha commented Aug 26, 2016

jimmywarting commented Aug 26, 2016

jimmywarting commented Aug 26, 2016 • edited Loading

hashitha commented Aug 29, 2016

jimmywarting commented Aug 29, 2016 • edited Loading

jimmywarting commented Aug 29, 2016 • edited Loading

hashitha commented Aug 29, 2016

jimmywarting commented Aug 29, 2016

hashitha commented Aug 29, 2016

jimmywarting commented Aug 29, 2016 • edited Loading

hashitha commented Aug 29, 2016

jimmywarting commented Aug 29, 2016

hashitha commented Aug 29, 2016

jimmywarting commented Aug 29, 2016 • edited Loading

hashitha commented Aug 29, 2016

jimmywarting commented Sep 3, 2016 • edited Loading

hashitha commented Sep 3, 2016

jimmywarting commented Sep 3, 2016

hashitha commented Sep 3, 2016

jimmywarting commented Sep 3, 2016 • edited Loading

jimmywarting commented Sep 3, 2016

jimmywarting commented Oct 7, 2016

venkatmarepalli commented Jul 25, 2019

jat255 commented Mar 19, 2020

jimmywarting commented Mar 19, 2020

jat255 commented Mar 19, 2020

Inderjeet96 commented Apr 24, 2023

jimmywarting commented Aug 25, 2016 •

edited

Loading

jimmywarting commented Aug 26, 2016 •

edited

Loading

jimmywarting commented Aug 26, 2016 •

edited

Loading

jimmywarting commented Aug 29, 2016 •

edited

Loading

jimmywarting commented Aug 29, 2016 •

edited

Loading

jimmywarting commented Aug 29, 2016 •

edited

Loading

jimmywarting commented Aug 29, 2016 •

edited

Loading

jimmywarting commented Sep 3, 2016 •

edited

Loading

jimmywarting commented Sep 3, 2016 •

edited

Loading