Replaced glob with readdir-glob to be memory efficient #433
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As described in the following issue: #422, node-glob is not designed to handle a huge amount of files: it requires a quantify of memory that is proportional to the number of matched files.
Why ? Because, it lists only the folders that appears in the pattern, and it has to memorise all the found files to ensure a same file is not emitted twice.
It makes sense to proceed like this when the pattern matches only a little proportion of the filesystem.
But as it's not a common use case when creating an archive, it would be more efficient to list all the files, and then check if they match the given pattern.
Advantage:
Drawback:
archiver.glob('*.txt',{cwd: '/tmp'})
has to be used instead ofarchiver.glob('/tmp/*.txt')
The current PR implements this approach by replacing
glob
withreaddir-glob
that is memory-efficient.It also pauses the glob stream when archiving is on-going, to keep the memory usage stable.
Maybe it would be better to offer this as a new option, or only to replace the
directory()
function.Feel free to give feedback :)