Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Add zip64 #211
This adds zip64 writing support.
The way zip64 is implemented is by appending a set of "extra" values to the header.
The central header is simple enough, and most implementations simply use this and mostly ignore the other header. Once we are writing the central header, we have all the information required, so we can just write it.
For the local header, there is a tradeoff. The "extra" bytes take up 2+2+8+8=20 bytes pr. entry. This header is only required if either stream size (compressed and uncompressed) exceeds
The dilemma is: do we write it for all files, in case one is too long? Or do we not write it and risk overflowing the values?
Since the header is "mostly ignored" we could live with this being broken. On the other hand, if we can use it we should.
I have added a hint value to the
If the stream is non-seekable, we have another issue, namely that the file would normally set a flag and then write the Crc, uncompressed size, and compressed size in a special trailing header. This trailing header has not been updated to use zip64, so we cannot write the correct values in it. We can also not use both trailing headers and "extra" data. This was clarified from PKWare: https://blogs.oracle.com/xuemingshen/entry/is_zipinput_outputstream_handling_of
In the case of streaming, the local headers are written with the trailing data, which may overflow. But the headers do contain the crc32 value, and may contain correct data if the sizes are less than
Not sure how to deal with testing, as it requires files +4GiB to hit the limitations.
So if I understand, there's no post data descriptor for zip64 but the central directory has the values?
Then you can't use streaming reads on zip64?
This pretty shitty. I guess they updated APPNOTE.txt:
So there's currently no way to forward-only read entries out of a zip64.
Would a good test to just be forcing zip64 on a small file? The values getting overloaded shouldn't really matter.
Yes, that is my understanding: streaming cannot write 64bit values in the post-data.
I know they previously said that the post-data header should have 8-byte values for zip64, but they forgot to mention how to figure out if the file is zip64 when reading forward-only. The update you mention is probably a thought that there should be some kind of identifier in the post-data region.
You can still use forward-only reading, but you cannot verify if the lengths are correct, if the stream is larger than 4GiB.
Edit: This problem is worst if you write the archive forward-only AND read the archive in forward-only. If the stream is seekable, it is possible to forgo the post-data and use the zip64 extras, such that forward-only reading is possible.
You can force zip64 to have the extra headers, but I would expect errors to show up when values are hitting the
One detail I forgot to mention in the initial text is that the list of files report the
You could always just say upfront to write everything as zip64 and deal with streaming that way. Then write post descriptors with 8 bytes. I guess that would be incompatible with other implementations?
I guess if the code allows forward-only reading but you'd have to read the central directory for the valid length that's something. However, that's now how the reader works now. The central directory isn't used with readers, only archives. Ugh.
I guess the count should be 'ulong' like everything else.
This code really needs some refactor lovin too I think.
I guess we can look at "what others do", but that is likely to cause errors. If the readers expect 4-byte values, there should at least be some indicator somewhere that the values are not 4-byte.
Yes, but that would mean replacing the system
Should we do the same for the case where we use a seek-able stream, and do not have space for the zip64 extra fields?
Maybe we could then choose to not make space for it by default (remove my 2GiB threshold) and make the caller set the zip64 flag when creating the archive or stream exceeds 4GiB ?
Ok, I will update the code to require explicit zip64 activation when writing streams larger than 4GiB. And disallow writing streams larger than 4GiB in forward-only mode.
Archives larger than 4GiB can be transparently supported, as long as all streams inside are less than 4GiB individually.
I have updated the logic to require setting the zip64 flag before writing a stream larger than 4GiB.
I have added the check to the
Not sure how you prefer this to work, but we cannot know in advance if the write expands the file beyond 4GiB, so I have made a pre-check and a post-check. If we catch it in the pre-check, the file is correct (but with a shorter stream), if we catch it in the post check, invalid data is now in the archive.
I have also added a unittest for trying out various parts of the zip64 features.