New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 20239 - chameleon zip-file should be rejected by std.zip #7198
Conversation
Thanks for your pull request and interest in making D better, @crocopaw! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + phobos#7198" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding and reading a directory from the end seems like an optimization attempt for slow disk devices. But, std.zip
already has the data in a void[] buffer
. In this case, wouldn't it be simpler and more correct to simply walk all archive members from the beginning to the end?
That was my first idea too. But when I dived more deeply into the data format of zip files it was soon clear, that this is impossible. The chunks inside a zip file do not need to be alligned (nor sorted). There can be other stuff before (think of self extracting zip files) and between (e.g. removed files from the archive, where only the entry in the end record has been deleted). Even overlapping is not prohibitted by the specs. This gives a lot of possibilities for malware. Avoiding being trapped by malware makes it necessary to start at the end of central directory record. Unfortunately zip allowes for a comment (of apriori unknown size) after that record, which again gives leeway for malware (see bug report). I think, best one can do is to scan these 16k of bytes. While the old approach stopped when it found one correct entry, the new version looks if there is another correct entry and rejects the whole zip file in this case. Having more than one correct entry is a strong indicator of having to do with malware, because in normal zip files, there should never be comments containing x05/x06 bytes. And the chances, that the data before the intended record form a valid second entry is almost zero too, but in rare circumstances possible. And one more reason to start at the end: Although the whole file is kept in memory at the moment, I think it might be reasonable to improve this later by replacing the 'void[] buffer' with a random-access-range. This will speed things up in some circumstances (think of a large zip-archive, where only few small files need to be retrieved and therefore only part of the archive needs to be read from file). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation.
Please address the constness nit. LGTM otherwise.
I moved the problematic enum back into the code. I'll cope with those magic constants in a separate PR. |
See PR #7201 for a solution of the problem with the enum. |
Resolved merge conflicts with PR #7201. |
The lines around 640 where wrongly indented. I corrected them, although this is not directly related to the bug fix. Hope, this doesn't disturb.