Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeError - when adding just over 2,400 files #343

Open
hashitha opened this issue Aug 23, 2016 · 37 comments
Open

RangeError - when adding just over 2,400 files #343

hashitha opened this issue Aug 23, 2016 · 37 comments

Comments

@hashitha
Copy link

I am getting RangeError when I add more than 2400 files, Total size of the final zip file would be around 1.3gb. Works fine for anything less than that.

 zip.generateAsync({ type: "blob"  }, function onUpdateCallback(metadata) {
                       console.log(metadata);  //This goes to 100% then RangeError
                    }
                    ).then(function (content) { 
                        saveAs(content,  "test.zip");
                      });

capture

@dduponchel
Copy link
Collaborator

To generate a 1.3GB zip file with generateAsync, JSZip currently needs 3 times this amount:

  • one time for the original, non compressed content
  • one time for the intermediary chunks
  • one time for the final zip file

The first one could be mitigated by fixing #290. The second one could be the easiest improvement. The third one is needed by generateAsync.

Could you describe your use case ?
What are these 2400 files ? Files generated client side or real files from a <input type="file" multiple /> ? Do you add them as a Blob, as an ArrayBuffer, as a String, etc ?

@hashitha
Copy link
Author

Thanks for the info,

Basically we are writing a medical app that downloads patient CT, Ultrasound, X-Ray, MRI etc.
And some CTs and MRIs can have over 2k images (DICOM files). So when the user clicks download I am using JSZipUtils.getBinaryContent to download the individual images and the combine all those images to one zip file (without compression) using JSZip. This work really well until we hit this limit, then it throws exception at the very end just after onUpdateCallback returns 100%. I am adding them as blob.

We have written all our back-end code in aws lambda, and lambda has a limitation of 500mb disk space so i am unable to compress it from there and serve. Our only other option is to have a server running which I really don't won't to do after all the work we have done to go server-less.

@dduponchel
Copy link
Collaborator

Do you have a constraint on the browser ? Using generateInternalStream instead of generateAsync would remove a lot of memory pressure but needs a browser supporting web streams (only chrome right now). Then, with StreamSaver.js, save the stream. The stream API are not compatible, but it should be easy to convert them (disclaimer: I never tested this code):

var writeStream = streamSaver.createWriteStream('output.zip');

zip
.generateInternalStream({type:"uint8array"})
.on('data', function (data, metadata) {
    writeStream.write(data);
})
.on('error', function (e) {
    writeStream.abort(e);
})
.on('end', function () {
    writeStream.close();
});
.resume();

Without streams, I don't see any magical solution. I'm modifying the accumulate method to improve memory usage with blobs (which seem to be moved from the memory under pressure) but the results will depends on the browser implementation.

@hashitha
Copy link
Author

I have tried it with the latest chrome and I keep getting "WritableStream is not defined", event using the examples of StreamSaver doesn't work.

I think our only other option is to zip part by part. e.g. every 1,000 images we create one zip file and give to user part1.zip, part2.zip etc. It is of an inconvenience to the user but I don't see any other way at this stage.

@jimmywarting
Copy link
Contributor

I keep getting "WritableStream is not defined"

@hashitha you need to include the web-streams-polyfill to work

@hashitha
Copy link
Author

Thanks @jimmywarting , I tried that now getting

StreamSaver.js:122 Something went wrong! TypeError: writeStream.write is not a function

after reading mitm I am now using 127.0.0.1 to test still not working.

This is the code I am using

<script src="lib/StreamSaver.js"></script>
<script src="lib/polyfill.min.js"></script>
<script type="text/javascript">
  var readable = new window.ReadableStream;
</script>
      var writeStream = streamSaver.createWriteStream('output.zip');
      zip.generateInternalStream({ type: "uint8array" })
        .on('data', function (data, metadata) {
          console.log(metadata);
          writeStream.write(data);
        })
        .on('error', function (e) {
          writeStream.abort(e);
        })
        .on('end', function () {
          writeStream.close();
        }).resume(); 

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 25, 2016

Hmm, they have changed the spec since i wrote this... going to work on updating the docs

looks like you now have to call .getWriter()

So something along this:

var writeStream = streamSaver.createWriteStream('output.zip').getWriter();

@jimmywarting
Copy link
Contributor

Updated the readme/example/dependencies with how you should deal with the new spec
You just do what i wrote above

var writeStream = streamSaver.createWriteStream('output.zip').getWriter();

@hashitha
Copy link
Author

I have tried it and it seems to work except its very unstable. Browser tab crashes at least 50% of the time.

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 26, 2016

Hmm, need to try it myself to figure out what is wrong. Do you have some sample code/files I can try out?

@hashitha
Copy link
Author

This is a snippet similar to what I use, only different is that you need to specify the urls, I had to remove the links because they were patient files. You can add around 5 links and run it.

<script src="lib/StreamSaver.js"></script>
<script src="lib/polyfill.min.js"></script>
<script type="text/javascript">
  var readable = new window.ReadableStream;
</script>
<script src="lib/jszip.js"></script>
<script src="lib/jszip-utils.js"></script>
<script src="lib/FileSaver.js"></script>
 var zip = new JSZip();
  var urlList = []; // add the urls here
  var downloadCount = 0;
  var totalCount = urlList.length;
  urlList.forEach(function(url) {
      //download each file
      JSZipUtils.getBinaryContent(url, function(err, data) {
          if (err) {
              console.log(err);
          }
          zip.file(downloadCount + ".dcm", data, {
              binary: true
          });
          downloadCount++;
          if (downloadCount == totalCount) {
              /* old code
                            zip.generateAsync({
                                type: "blob"
                            }, function onUpdateCallback(metadata) {
                                //console.log(metadata);
                            }).then(function(content) {
                                // see FileSaver.js
                                saveAs(content, studyDetails.patientName + ".zip");
                            }); 
              */
              var writeStream = streamSaver.createWriteStream('output.zip').getWriter();
              zip.generateInternalStream({
                      type: "uint8array"
                  })
                  .on('data', function(data, metadata) {
                      //console.log(metadata);
                      writeStream.write(data);
                      /*
                                            if (metadata.percent == 100) {
                                                console.log("END");
                                                writeStream.close();
                                            }
                                            */
                  })
                  .on('error', function(e) {
                      console.log(e);
                      writeStream.abort(e);
                  })
                  .on('end', function() {
                      console.log("END");
                      writeStream.close();
                  }).resume(); 
          }
      });
  });

@jimmywarting
Copy link
Contributor

This is what i played around with: https://jsfiddle.net/Lxwvco3u/
seems to work fine all the time

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 26, 2016

When i added node stream + node buffer into the fiddle then i manage to get rid of the most memory and i was able to start saving the zip file right away and i could see the progress as i downloaded

https://jsfiddle.net/Lxwvco3u/1/

it would truly help if jszip supported Web streams (or at least the .getReader())


The key feature here is that you want to fetch the next binary file when jszip is ready to accept more data, the only way i was able to do that was with .rs._read() cuz that is the only stuff that says "hey i need more data"

Then the only thing you are going to keep in the memory then is the current file you are compressing

  • you pipe an ajax to jszip
  • jszip compresses it
  • jszip flush it to the streamSaver (on to the disk) dos releasing memory
  • the data from the ajax is garbage collected
  • and then you repeat the process for the rest of the files

Hence why i created #345 #346

@hashitha
Copy link
Author

Your fiddle code works, but I can't seem to get it working with my urls

I am no javascript expert so most probably doing something wrong.

this is the code

function add2ZipPart(linksArray) { 
  let zip = new JSZip
  let urlList = ["https://qubs-test.s3.amazonaws.com/dummyfile.txt"] // add the urls here
  let i = 0

  //linksArray.forEach(function (image) {
   // urlList.push(image.url);
  //});

  let writeStream = streamSaver.createWriteStream('output.zip').getWriter()

  for (let url of urlList) {

   /*
    let rs = streamBrowserify.Readable()
    let reader

    rs._read = () => {
      let p = reader || (reader = fetch(url).then(res => res.body.getReader()))
      p.then(reader => reader.read().then(({ value, done }) =>
        rs.push(done ? null : new Buffer(value)))
      )
    }
*/

    fetch(url).then(res => {
        // Same as above but another way to get a readable stream
        zip.file('dummyfile.txt', res.body)
    })

    //zip.file(`${i++}.dcm`, rs)
  }

  zip.generateInternalStream({ type: "uint8array" })
    .on('data', data => writeStream.write(data))
    .on('error', err => console.error(err))
    .on('end', () => writeStream.close())
    .resume()

}

With the commented code it creates zip file with empty files inside and the other it creates zip with no files inside.

Also I am guessing in this "for loop", it downloading one url at a time and then adding to zip where as the code I posted earlier using JSZipUtils.getBinaryContent downloads urls asynchronously and adds to the zip. Only issue here is if it was to download 1 file at a time then it would take a very long time to download 2,500 files each around 300kb. I am saying this because last couple of days I wrote a Amazon Elastic Beanstalk Worker Tier Application to zip the files and upload to a s3 bucket which worked but it took around 15mins and when I changed the code to use a thread pool it finished in around 3 mins. I am still hoping to use this library so the use can start downloading straight away and not wait for a beanstalk app to finish processing.

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 29, 2016

jszip don't support web streams yet... so you can't do this:

fetch(url).then(res => {
    // Same as above but another way to get a readable stream
    zip.file('dummyfile.txt', res.body)
})

This was but only a proposal i suggested for jszip to implement

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 29, 2016

You can not call zip.generateInternalStream until you have added all files yet... (by calling zip.file('dummyfile.txt', data))

That is why i added a node stream that can add a pointer to all the entries so all header/metainfo gets created first

@hashitha
Copy link
Author

Yea, I am only calling it when everything is added. Not sure if that's the correct way of doing it but it work.

if (downloadCount == totalCount) {

@jimmywarting
Copy link
Contributor

Seems like you need some temporary client side storage to cache everything first.
But before going in to that you could try to download right away by just changing it a little bit

let rs = streamBrowserify.Readable()

// Start downloading right away, 
// Browser will make 6 continues download and the rest will be pending
let p = fetch(url).then(res => res.body.getReader())

rs._read = () => {
  p.then(reader => reader.read().then(({ value, done }) =>
    rs.push(done ? null : new Buffer(value)))
  )
}

I'm uncertain if it will have negative or positive effect on the download
Worst case scenario is that everything becomes pending and don't proceed since the ajax call don't get called in the right order

@hashitha
Copy link
Author

Can I try this code now or do I have to wait till jszip supports web streams?

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 29, 2016

My version with node-stream + node-buffer can be used right now. You can try it yourself

I tried that modification on my second fiddle And i didn't like what i saw. jszip didn't read them in the same order i added the ajax call but it kept on downloading (so it get cached somewhere on the disk i suppose)

the effect was: The saving part didn't happen right away since it didn't get the first piece first. And the progress was jumping from 87 mb downloaded - 122 mb downloaded, 155 mb downloaded
between dose it was like a long duration


It would be grate if you somehow could knew what order jszip would read the files - so you can make a ajax call in the same order

@hashitha
Copy link
Author

Are you able to provide me with a snippet for node-stream + node-buffer maybe to download and zip these 2 urls
https://qubs-test.s3.amazonaws.com/dummyfile.txt
https://qubs-test.s3.amazonaws.com/dummyfile2.txt

@jimmywarting
Copy link
Contributor

sure: https://jsfiddle.net/Lxwvco3u/2/
skarmavbild 2016-08-29 kl 15 11 10

The highlighted code is what makes a web stream to a node stream
This time i added the modification to download it directly

Also added a CORS proxy since those file didn't have CORS headers

@hashitha
Copy link
Author

Thanks @jimmywarting . Sorry forgot to add CORS to the test bucket.

I tried this code and it downloaded 2,400 files (1.2GB) in about 2 mins and all files seems to be intact inside the zip file.

Just 2 more questions I have.

  1. Is there anyway we can get rid of the mitm.html, because just before downloading it opens another tab (mitm.html) for about 3 seconds and closes.
  2. Currently this only works with latest chrome, Do you think they will add this to other browsers in future?

@jimmywarting
Copy link
Contributor

jimmywarting commented Aug 29, 2016

If your site is running https then you won't notice the mitm. (A hidden iframe will be used instead) mitm.html is required by http in order to send a message channel to a service worker and service worker can only be registered if the top domain is secure. So the StreamSaver service is actually a part of github.io

The mitm can also be abstracted away all together but that means more configuration for developer to set it up in there own environment but requires that the site use https

Think Firefox are close to implement streams so it will come eventually. It's a whatwg standard so it's not a chrome specific thing

@hashitha
Copy link
Author

Thanks, it should be okay then as our production site is https. Thanks again for all your help

@jimmywarting
Copy link
Contributor

jimmywarting commented Sep 3, 2016

Think I would like to borrow this thread and reopen it if possible/necessary.

Together with StreamSaver, generateInternalStream and node streams i have been able to give data to jszip when required and save data when i receive some from jszip to get a good memory boost

That's grate in the beginning and the end but durning the time i seeded a file to jszip i got nothing back. I think it's cuz jszip tries to compress the file first?. This works okey for many smaller files

When running this jsfiddle: https://jsfiddle.net/Lxwvco3u/3/
I will see this in my console.log

(1876) reading chunk of file: 1
(1876) writing chunk
(1876) reading chunk of file: 2
(1876) writing chunk
(1876) reading chunk of file: 3
(1876) writing chunk

For this purpose compressing doesn't matter the file is not going to be upload/download or save any few extra bytes on the hard drive. This is about saving multiple files and folder from the browser since it's a lot easier to save one zip and decompress it rather then giving multiple save dialogs to same folder. So you want a fast read/write, compress/decompress as possible

So my question is more like: Can you turn of the compression and pipe the chunks directly to StreamSaver? So you will get more like this in the console log?

reading chunk of file: 1
writing chunk
reading chunk of file: 1
writing chunk
reading chunk of file: 1
writing chunk
reading chunk of file: 2
writing chunk
reading chunk of file: 2
writing chunk
reading chunk of file: 2
writing chunk
reading chunk of file: 3
writing chunk
reading chunk of file: 3
writing chunk
reading chunk of file: 3
writing chunk

Ideally i would like to maybe save a hole 4gb file + a readme or something

@hashitha hashitha reopened this Sep 3, 2016
@hashitha
Copy link
Author

hashitha commented Sep 3, 2016

On some computers, the download speed goes to 0 kbps after a few seconds where it looks like it stopped downloading but then after a minute or so the file has completed downloading. Maybe what you are suggesting is causing this as well?

@jimmywarting
Copy link
Contributor

@hashitha, that is cuz you are not downloading the remote files in the same order as jszip wants to read them

@hashitha
Copy link
Author

hashitha commented Sep 3, 2016

@jimmywarting is there any way around this? I am using the code snippet you wrote before

@jimmywarting
Copy link
Contributor

jimmywarting commented Sep 3, 2016

Maybe someone from the developers can tell you what order they will be read in. Or maybe you can try to figure it out yourself by logging the filename in rs._read

rs._read = () => {
console.log(filename)
}

Then you should try to order the array of files to be downloaded to be in the same order
Otherwise you need to reserve one Ajax that's fetching the current file jszip is trying to read and the other 5 Ajax calls just randomly buffer up the memory not knowing when its going to be read

@jimmywarting
Copy link
Contributor

My first fiddle https://jsfiddle.net/Lxwvco3u/1/ that only use 1 Ajax call at the time won't randomly buffer up the memory.
It will download the file jszip is currently trying to read. And send it to StreamSaver as it goes forward and you will see the download progressive getting bigger

But that will also be slower

@jimmywarting
Copy link
Contributor

Since I found about about streamFiles option in generateAsync i got what i wanted and have now manage to make the most ram friendliest method possible https://jsfiddle.net/Lxwvco3u/5
So i'm okey with closing this issue now...

@venkatmarepalli
Copy link

I came across this thread while working on a similar problem. I am downloading from a number of urls, the solution works fine but the generateInternalStream works only after adding all the files to zip object. Can we download and stream at the same time instead of relying on system memory?

@jat255
Copy link

jat255 commented Mar 19, 2020

@jimmywarting Thanks for your implementation example in the Fiddle. I was able to get it working for my use case in Chrome. Do you know why this wouldn't work in Firefox (since it seems like they implemented the stream API from what I can tell). I'm getting an error when trying to run the fiddle:

TypeError: streamSaver.writableStream is not a constructor

@jimmywarting
Copy link
Contributor

Have kinda left jszip for my own streaming implementation now. That fiddle is also very old.

better streaming support, less ram usage. Made for downloading/saving

Also helped out to build https://github.com/transcend-io/conflux/ that have support for both reading and creating zip files using whatwg streams

@jat255
Copy link

jat255 commented Mar 19, 2020

Have kinda left jszip for my own streaming implementation now. That fiddle is also very old.

better streaming support, less ram usage. Made for downloading/saving

Also helped out to build https://github.com/transcend-io/conflux/ that have support for both reading and creating zip files using whatwg streams

Fantastic, thank you! The example you gave here is more or less what I need to do (breaking up the file list into another zip stream if it will go over 4GB to get around that limitation).

@Inderjeet96
Copy link

Have kinda left jszip for my own streaming implementation now. That fiddle is also very old.

better streaming support, less ram usage. Made for downloading/saving

Also helped out to build https://github.com/transcend-io/conflux/ that have support for both reading and creating zip files using whatwg streams

Hi @jimmywarting , I've used the code to download zip files "https://github.com/transcend-io/conflux/", is there any way I can get the progress bar how many files or data are zipped?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants