doesn't work with empty file streams #44

Closed
maxogden opened this Issue Aug 22, 2013 · 8 comments

2 participants

@maxogden

i'm using archiver to implement https://github.com/maxogden/dir-tar-stream but ran into what I think is a bug

basically if I .append a fs.createReadStream('someFileOfLengthZero') then archiver wont ever finish writing the tar.

here is my fix: https://github.com/maxogden/dir-tar-stream/blob/09eb540990aa6e4c82b852c71477a761b7408e82/index.js#L25-L31

@maxogden

hmmm actually I am getting empty files in the resulting tarball. if I console.log('CHUNK', filename, chunk.length) inside of the .on('data' event on the fs.readStream in the above linked code I get the correct byte lengths (only files that arent empty are shown here)

CHUNK 000050.sst 440
CHUNK 000052.sst 184
CHUNK 000055.sst 184
CHUNK CURRENT 16
CHUNK LOG 166
CHUNK LOG.old 166
CHUNK MANIFEST-000097 65536

but when I untar the tar that archiver gives me I get this:

screen shot 2013-08-22 at 3 19 01 pm

seems like archiver is maybe getting confused that i'm giving it a mix of readable streams and empty buffers? just a guess

@maxogden

I was able to fix it by rewriting the above linked code as:

if (file.stat.size === 0) {
  archive.append(emptyBuffer, { name: relPath })
} else {
  pending++
  fs.createReadStream(file.path).pipe(concat(function(buf) {
    archive.append(buf, { name: relPath })
    pending--
    finish()
  }))
}

where concat is a https://npmjs.org/package/concat-stream

@maxogden

also a followup question: how purely streaming is archiver? say I have 4GB of RAM and I want to create a tar stream made up of 100 1GB files should it work for that use case? I ask because in the code in the previous comment I'm buffering the entirety of each file before appending it to the archive stream, but I was wondering if internally archive did something similar or of it was more streamy

@ctalkington
Archiver member

archiver doesn't really know the size (to handle zero length internally) since its using streams internally. actually take that back since its tar mode you're referring to and we have to know the size, it may be possible the issue is when its a stream source it has to be collected which waits for an end event on the source.

in theory, if you were using file streams with a lazy stream that wrapped them so they didn't all open at once, you could most likely achieve what you mention with more data than RAM. However, you have to remember that your system is already going to be using some of that RAM. I'm not sure if anyone has actually tried this yet though.

essentially with a streams2 fs.createReadStream, it should only pull file data into RAM buffer when its being read from and needs it to fill buffer or answer a request. then in tar mode, its going to be pulling all that into a separate buffer since we have to know final size to write the header since there is no central directory like in zip. so in tar mode, you most likely would only be able to handle (theoretically) around 384mb file with 1GB free RAM (1/3 or so, depending on factors) since itll be:

source file -> source buffer -> collect buffer -> archiver output buffer -> destination output buffer -> destination

you also have to factor in speed at which output is handled by destination since RAM is inherently faster than most HDD or broadband connections (if outputting via http). the suction system that is part of streams2 handles these slower sources pretty well but that is done via said buffers so it could bite you a bit.

@ctalkington
Archiver member

you really shouldn't need concat or fileData.on('data', function() { }).

archiver knows how to handle both streams and buffers so shouldnt have any issues the way you have used a var to pass either one.

also when hooking on to error event in archiver there is no need to pass callback to finalize() just to catch an error.

@ctalkington
Archiver member

ive added some tests and a slight adjustment for tar. will go out sometime this weekend.

@ctalkington
Archiver member

archiver v0.4.9 has been pushed out with a check for size in tar logic. please open a new issue if this doesn't resolve things for you.

@ctalkington ctalkington closed this Sep 3, 2013
@ctalkington
Archiver member

also, you might give the following code a try

https://gist.github.com/ctalkington/6419249

It uses lazystream so that files are only actually opened right before they are read from. you also could most likely shave a bit more of logic off but just wanted to quickly post an untested example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment