-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Zip::File, Zip::Reader and Zip::Writer #3901
Conversation
I think there we are missing specs for zip with files nested in directories. |
@bcardiff Zip entries have a filename, for example |
I checked with a .zip created in windows and they do have forward slashes as path separator. This SO thread had better references. Regarding missing specs, I though entries were traversed in a tree fashion. Maybe a spec could be added to clarify that. |
Entries are stored sequentially inside a zip file. I must admit that I learned everything about the zip file format for implementing this functionality and I also didn't know how zip file entries were stored. I don't know how much of the spec we should repeat in the docs, though... For example, in the spec it says:
So you can put first a file named "foo/bar/baz.txt", then a directory named "foo/", then a directory named "foo/bar/" and so on, without any order that relates to a tree in the filesystem. There's also the thing that |
Does it support ZIP64 archives? |
@Sija No, ZIP64 is not supported, there's a note I left in the doc comment of the |
@asterite Roger that, thx for clarifying. |
# | ||
# `Zip::File` is the preferred method to read zip files if you have | ||
# can provide a `File`, because it's a bit more flexible and provides | ||
# more complete information for zip entries (such as comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] if you have can provide [...]
— there's either too much or too lil' of sth ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asterite I've added a few comments: most of them questions or suggestions. There is only one small thing I think could be a bug, but maybe I'm wrong.
|
||
if self_bytesize < 0 | ||
raise ArgumentError.new("negative bytesize") | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this check be on bytesize
instead of self_bytesize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I'll fix it
io.gets_to_end.should eq(File.read(filename)) | ||
end | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a spec on the invalid arguments cases: negative offset and offset plus size out of bounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I see there are such specs for memory. Maybe add them to file as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I'll add a spec too :-)
|
||
def unbuffered_write(slice : Bytes) | ||
raise IO::Error.new("can't flush read-only IO") | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flush => write?
end | ||
|
||
typeof(Zip::File.new("file.zip")) | ||
typeof(Zip::File.open("file.zip") { }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these two typeof calls for here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They check that these calls compile, regardless of what they do. Tests for these should almost be the same as previous tests, but I want to make sure that their body compiles fine.
typeof(Zip::Reader.open("file.zip") { }) | ||
|
||
typeof(Zip::Writer.new("file.zip")) | ||
typeof(Zip::Writer.open("file.zip") { }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, maybe I'm missing some standard practice in the specs?
# Computes a CRC32 while reading from an underlying IO, | ||
# optionally verifying the computed value against an | ||
# expected one. | ||
private class ChecksumReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this class to be private and scoped in the Zip
module? As far as I understand, CRC32 is quite useful in many other applications, so maybe we could move it to Zlib
and make it public. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that, maybe in a next PR. I checked some other implementations, for example Go has something similar and it's private to the zip package, maybe it's not that useful in other contexts.
module Zip | ||
# Counts written bytes and optionally computes a CRC32 | ||
# checksum while writing to an underlying IO. | ||
private class ChecksumWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with Reader
directory_end_offset = find_directory_end_offset | ||
entries_size, directory_offset = read_directory_end(directory_end_offset) | ||
@entries = Array(Entry).new(entries_size) | ||
@entries_by_filename = {} of String => Entry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zip allows duplicate entries, though it seems to be a VERY rare use case, and can still be handled by simply iterating the @entries
. If we want to support it, then @entries_by_filename
should be a hash from string to Array(Entry)
, which I don't particularly like since it makes the API more awkward. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took inspiration from Java's API. Apparently C# has it too. They don't mention the possibility of duplicated entries. Maybe it's so uncommon that it isn't worth considering, and a nicer API is better.
io.skip(22) | ||
filename_length = read(io, UInt16) | ||
extra_length = read(io, UInt16) | ||
@offset + 30 + filename_length + extra_length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna go ahead and trust you in the 22 and 30 there :-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically skipping 22 bytes from the header instead of reading them one by one. 30 is the size of the header without counting the filename length and extra length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a code comment about those constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RX14 @spalladino here you go: 6952e76
zip["one.txt"].open(&.gets_to_end).should eq("One") | ||
zip["two.txt"].open(&.gets_to_end).should eq("Two") | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was the test.zip generated? I was going to suggest adding a spec to read a zip file generated with a standard tool (otherwise we are just checking that we can read the zip files that we have written ourselves), but if this file was created somehow else, then that covers it.
However, the other way around would also apply. Can we can rely to have a zip command-line tool on the CI (at least for the major architectures), so we test that it can open zipfiles generated from Crystal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generated test.zip
by creating a zip file using Mac, right-click + compress. In fact when I did that I found some bugs in the original code I had, so that's why later I decided to add a zip file that wasn't generated by the same Zip
module.
I don't know if we have a zip
command line tool on the CI, but it's definitely a possibility (but maybe fore the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding zip
to CI shouldn't be hard in the future.
Additionally add File#read_at and IO::Memory#read_at
Please check the docs for
Zip
,Zip::File
,Zip::Reader
andZip::Writer
. The docs explain why there are two types for reading a zip file.I needed to add
File#read_at
andIO::Memory#read_at
for this. This isn't strictly necessary, but provides a way to read multiple files from a zip file (stored in aFile
orIO::Memory
) concurrently without problems, so you could for example extract multiple files concurrently. ForFile
I usedpread
, and forIO::Memory
I use a sub-view. Go also usespread
for reading from a zip file.