Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input file buffers retained in memory after compression of the file has finished #213

Open
robatwilliams opened this issue May 23, 2024 · 2 comments

Comments

@robatwilliams
Copy link

Context
In the browser web worker, zipping a directory containing 2000 files totalling 2GB. 80% of files are under 100KB, there are about 10 of 10-50MB, and the rest are in between. It takes just under 2 minutes (initial sync implementation), resulting zip file is 1.8GB.

How to reproduce

In principle:

const zip = new fflate.Zip();
const zipOutputStream = fflToRS(zip);  // https://github.com/101arrowz/fflate/wiki/Guide:-Modern-(Buildless)
zipOutputStream.pipeTo(targetFileStream);

// https://developer.mozilla.org/en-US/docs/Web/API/FileSystemDirectoryHandle#return_handles_for_all_files_in_a_directory
for await (const fileHandle of getTreeFileHandles(sourceDirHandle)) {
  const relativePath = await sourceDirHandle.resolve(fileHandle);
  const compressionStream = new fflate.ZipDeflate(relativePath.join('/'));
  zip.add(compressionStream);

  const file = await fileHandle.getFile();

  for await (const chunk of file.stream()) {
    compressionStream.push(chunk);
  }
  compressionStream.push(new Uint8Array(), true);
}

zip.end();

The problem

The renderer process memory usage grows to over 2GB during the compression.

As the output is being streamed to high performance disk with each chunk (Origin-private file system using syncAccessHandle), this isn't expected. Chunks should be read, compressed, and written, without any data hanging around.

Looking at the allocation timeline of the worker in DevTools at a random point a few seconds into the compression, I can see 500MB of JSArrayBuffer data being retained. Most are of size 98,304 (Uint8Array) or 2,097,152 (Uint16Array) and are retained by Deflate objects held in the u array of Zip. They are buffers and other structures to do with compression. It doesn't seem to me that it's necessary for these to be retained in memory once a file has been compressed.

Workaround

Discard all references to the d Deflate object after the final compressed chunk has been emitted:

const ondata = compressionStream.ondata;
compressionStream.ondata = (error, data, final) => {
  ondata(error, data, final);

  if (final) {
    compressionStream.d = null;
    zip.u.at(-1).d = null; // Object created in `zip.add()`
  }
}

With this in place, my scenario will use 100-500MB of renderer memory depending on when Chrome garbage collects.

@101arrowz
Copy link
Owner

Thanks for taking the time to diagnose the issue here! This looks like a good change, I'll make it for the next release.

@robatwilliams
Copy link
Author

Great, thanks.

Of course the workaround from client code is hacky, so I'm sure there'll be a better way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants