Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced glob with readdir-glob to be memory efficient #433

Merged
merged 1 commit into from
Jul 23, 2020

Conversation

Yqnn
Copy link
Contributor

@Yqnn Yqnn commented Jul 22, 2020

As described in the following issue: #422, node-glob is not designed to handle a huge amount of files: it requires a quantify of memory that is proportional to the number of matched files.

Why ? Because, it lists only the folders that appears in the pattern, and it has to memorise all the found files to ensure a same file is not emitted twice.
It makes sense to proceed like this when the pattern matches only a little proportion of the filesystem.
But as it's not a common use case when creating an archive, it would be more efficient to list all the files, and then check if they match the given pattern.

Advantage:

  • memory consumptions is fixed, no matter the number of matched files
  • it's faster when the proportion of matching files is high

Drawback:

  • absolute glob patterns are not supported: archiver.glob('*.txt',{cwd: '/tmp'}) has to be used instead of archiver.glob('/tmp/*.txt')
  • it's slower when few files matches in a big filesystem

The current PR implements this approach by replacing glob with readdir-glob that is memory-efficient.
It also pauses the glob stream when archiving is on-going, to keep the memory usage stable.

Maybe it would be better to offer this as a new option, or only to replace the directory() function.
Feel free to give feedback :)

…addir stream is paused when archiving is on-going
@ctalkington ctalkington merged commit a4c4507 into archiverjs:master Jul 23, 2020
@melitus
Copy link

melitus commented Jul 23, 2020

@Yqnn Is memory consumptions also fixed for archiver.append(file, {name: filename})

@Yqnn
Copy link
Contributor Author

Yqnn commented Jul 23, 2020

@Yqnn Is memory consumptions also fixed for archiver.append(file, {name: filename})

Nop, if you use append() the library has to remember all the appended files, so it's not possible to make it memory efficient in that case.
You have to put in place a throttling mechanism on your side. I guess you could check if archiver._queue.length is small enough before appending new files.

@melitus
Copy link

melitus commented Jul 23, 2020

@Yqnn Thanks. But is there any alternative to append(). I am passing buffer and filename to append()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants