Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finalize promise resolved before all files are written to archive #476

Closed
JekRock opened this issue Nov 24, 2020 · 8 comments
Closed

finalize promise resolved before all files are written to archive #476

JekRock opened this issue Nov 24, 2020 · 8 comments

Comments

@JekRock
Copy link

JekRock commented Nov 24, 2020

In Express app, when sending a zip archive with big number of files (more than 100 in my case) the archive is opened with error "Unexpected end of data" in 7-zip even though all files are present and successfuly extracted from the archive.
Here is a code snipped that I used:

const someArrayOfItems: string[] = [...];

async function export(req: Request, res: Response, next: NextFunction): Promise<void> {
  res.type("application/zip");
  res.attachment("export.zip");

  const zip = archiver("zip");
  zip.pipe(res);

  let i = 0;
  for (const item of someArrayOfItems) {
      const buffer = await QRCode.toBuffer(string);
      zip.append(buffer, { name: `${i}.png` });
      i += 1;
   }

   await zip.finalize();
   await new Promise((resolve) => setTimeout(resolve, 5000));  // with this timeout zip is written correctly

   next();
}

On small numbers of items the archive is extracted without any errors both with the timeout after await zip.finalize(); and without it. But on bigger numbers, the archive is correctly extracted only when there is the timeout after await zip.finalize();.
It seems like the promise that is returned from the await zip.finalize(); is resolved before the archive is actually finalized

@JekRock
Copy link
Author

JekRock commented Nov 24, 2020

Here is the output for this snippet

const someArrayOfItems: string[] = [...];

async function export(req: Request, res: Response, next: NextFunction): Promise<void> {
  res.type("application/zip");
  res.attachment("export.zip");

  const zip = archiver("zip");
  zip.pipe(res);

  zip.on("drain", () => {
    console.log("Drained adding a file to zip");
  });
  zip.on("warning", (e) => {
    console.log(`Warning while adding a file to zip: ${e.message}`);
  });
  zip.on("finish", () => {
    console.log("Finish adding a file to zip");
  });
  zip.on("close", () => {
    console.log("closing zip");
  });
  zip.on("data", (data) => {
    console.log("on data");
  });
  zip.on("entry", (data) => {
    console.log("on entry");
  });

  let i = 0;
  for (const item of someArrayOfItems) {
      const buffer = await QRCode.toBuffer(string);
      zip.append(buffer, { name: `${i}.png` });
      i += 1;
   }

   console.log("before finalize");
   await zip.finalize();
   console.log("after finalize");

   next();
}

Output

on data
on data
on data
...
on data
before finalize
on data
...
on data
Finish adding a file to zip
after finalize
end request   // logged from a subsecuent middleware
on data
...
on data

It keeps fireing "on data" events even after the zip.finalize() promise is resolved

@jntesteves
Copy link

From the README.md:

Archiver
A streaming interface for archive generation

...

// finalize the archive (ie we are done appending files but streams have to finish yet)
// 'close', 'end' or 'finish' may be fired right after calling this method so register to them beforehand
archive.finalize();

From the docs:

finalize() → {this}

Finalizes the instance and prevents further appending to the archive structure (queue will continue til drained).

The end, close or finish events on the destination stream may fire right after calling this method so you should set listeners beforehand to properly detect stream completion.

Archiver is a streaming interface. Node streams are events-based. I don't know from where you got the idea that finalize() returns a promise.

@JekRock
Copy link
Author

JekRock commented Nov 24, 2020

@jntesteves

I don't know from where you got the idea that finalize() returns a promise

From the source code and from the TypeScript typings

What event represents the end of the archiver queue? As I understand, the "finish" or "close" event should be fired from the archiver when it finishes the finalize method, but the "finish" event is fired right after the finalize promise is resolved and the "close" event is never fired.

@JekRock
Copy link
Author

JekRock commented Nov 24, 2020

My question is why the archiver fires the finish event if it continues to pipe data to a stream?
The finish event should signal a stream that the archiver has finished piping all the data it has.

@jntesteves
Copy link

jntesteves commented Nov 24, 2020

Well, now that's a most unfortunate discovery that this was added there 3 years ago and never documented. I would say don't trust this promise because:

  1. It's undocumented;
  2. A function conditionally returning a promise is a textbook example of bad API design;
  3. It seems to listen to the end event on the producer stream, which does not guarantee the consumer have finished flushing it's internal buffers, as you have found out in this issue;
  4. If one of the early returns is hit, your promise will resolve because the function isn't returning an error, even though an error event is being emitted.

Fixing this would be an API-breaking change.

I hope this is enough to convince you to just not use it. As I said in a comment on the other issue you commented on, archiver is just a queueing helper on top of zip-stream/tar-stream. I don't even use archiver as I prefer to just do my own queueing and I don't like archiver's API, it's old and has too many problems as you've just found out.

EDIT: This event on the consumer stream is what you should probably be listenning to: https://nodejs.org/api/stream.html#stream_event_finish

EDIT2: These events are not from archiver, they are from the node streams that you are piping to/from archiver.

@JekRock
Copy link
Author

JekRock commented Nov 24, 2020

@jntesteves now it makes sense to me. This promise from the finalize is very confusing. Probably I should also use my own queueing.
Thank you for the explanation!

@JekRock JekRock closed this as completed Nov 24, 2020
mcasperson added a commit to OctopusSamples/content-team-apps that referenced this issue Mar 9, 2022
@buddypia
Copy link

buddypia commented May 7, 2023

I'm resolve it.
my code

function compressImagesToZip(directoryPath: string, zipFilePath: string): Promise<void> {
  return new Promise((resolve, reject) => {
    const output = fs.createWriteStream(zipFilePath);
    const archive = archiver('zip', {
      zlib: { level: 9 }
    });

    output.on('close', () => {
      resolve();
    });

    archive.on('error', (err: Error) => {
      reject(err);
    });

    archive.pipe(output);

    const files = fs.readdirSync(directoryPath);
    files.forEach((file) => {
      const filePath = path.join(directoryPath, file);
      if (fs.statSync(filePath).isFile() && isImageFile(filePath)) {
        const fileName = path.basename(filePath);
        archive.file(filePath, { name: fileName });
      }
    });

    archive.finalize();
  });
}

@adevine
Copy link

adevine commented Nov 3, 2023

Hey folks, I know this has been closed but:

The fact that streaming isn't completed even after calling await archive.finalize() is a giant footgun, and lots of users have this problem, e.g. #663.

In my own code I have a createZipArchive helper function that returns an Archiver that pipes to a file, but I specifically override the finalize function on the returned object to basically this (modeled off @shoridevel 's comment above):

    const thisArchive = archive;
    const streamingCompletedPromise = new Promise<void>((resolve, reject) => {
        outputZip.on('close', () => resolve()); // the stream previously defined
        thisArchive.on('error', (err: Error) => reject(err));
    });
    const originalArchiverFinalize = archive.finalize;
    archive.finalize = async () => {
        await originalArchiverFinalize.call(thisArchive);
        await streamingCompletedPromise;
    };

Wouldn't it be possible to just update the code so that the normal finalize function does wait for streaming to finish once the streams have drained? Would there be any downsides to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants