Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Executable bit in ZIP-archives get thrown away when reading from stdin. #1106

Closed
dreirund opened this issue Dec 11, 2018 · 9 comments
Closed

Comments

@dreirund
Copy link

dreirund commented Dec 11, 2018

I encountered that the command bsdtar from the package libarchive (under Arch Linux, at least) does throw away executable bits of files in .zip-archives when reading from stdin, but not when directly working on the file.

On .tar-archives it preserves the executable bit also when reading from stdin.

bsdtar --version: bsdtar 3.3.3 - libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.2 libzstd/1.3.5.

Test case:

I made a test archive http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.zip which contains two files in a subdirectory. One file is executable, the other not.

Extracting directly:

wget -q -O archive_exevutable_bit_test.zip http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.zip
bsdtar -x -f archive_exevutable_bit_test.zip
ls -nl archive_exevutable_bit_test/*

shows

-rwxr-xr-x 1 1001 1001 35 Dec 11 13:38 archive_exevutable_bit_test/executable.sh
-rw-r--r-- 1 1001 1001 33 Dec 11 13:39 archive_exevutable_bit_test/non-executable.txt

The executable bit for executable.sh is present here.

Reading from stdin:

wget -q -O - http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.zip | bsdtar -x -f -
ls -nl archive_exevutable_bit_test/*

shows

-rw-r--r-- 1 1001 1001 35 Dec 11 13:38 archive_exevutable_bit_test/executable.sh
-rw-r--r-- 1 1001 1001 33 Dec 11 13:39 archive_exevutable_bit_test/non-executable.txt

The executable bit for executable.sh is thrown away here.

.tar-archive:

As a comparison, for a .tar-archive, the executable bit in the archive is also honoured works also when reading from stdin:

wget -q -O - http://felics.kettenbruch.de/files/archive_executable_bit_test/archive_exevutable_bit_test.tar | bsdtar -x -f -
ls -nl archive_exevutable_bit_test/*

shows

-rwxr-xr-x 1 1001 1001 35 Dec 11 13:38 archive_exevutable_bit_test/executable.sh
-rw-r--r-- 1 1001 1001 33 Dec 11 13:39 archive_exevutable_bit_test/non-executable.txt

Expected behavious:

  • Permission handling should not depend on the source from which the archive is read.
  • Permission handling inconsistencies should not depend on the type of the archive.
@jsonn
Copy link
Contributor

jsonn commented Dec 11, 2018

Zip archives contains two different ways to describe the content:
(1) A per-entry header
(2) A central directory at the end of the zip file.
libarchive (and bsdtar by extension) will use the central directory if seeking is possible on the input, otherwise it will fall back to the streaming-only logic. The entries are not necessarily consistent as you found out in your test case. There isn't really much we can or want to do about this. Note that you can replace wget with a plain cat and it will still show the same behavior.

The short version is that this is an inherent issue with streaming of zip files and something that won't be fixed.

@jsonn jsonn closed this as completed Dec 11, 2018
@dreirund
Copy link
Author

dreirund commented Dec 11, 2018

According to http://unix.stackexchange.com/questions/487338#487371, this also happens if bsdtar itself created the ZIP archive. Shouldn't at least libarchive then create consistent meta-information (per-entry header and central directory having consistent information), so that archives created by libarchive are extracted correctly by libarchive? Maybe this then a bug in libarchive, that it creates ZIP archives with inconsistent information?

Is there any standard to ZIP which information (per-entry header or central directory) is more to trust?

@jsonn
Copy link
Contributor

jsonn commented Dec 11, 2018

bsdtar doesn't create the extension by default, it can be requested with --options zip:experimental.

@jsonn
Copy link
Contributor

jsonn commented Dec 12, 2018

Because ISO files are not streamable in most situations in a meaningful way. File attributes on the other hand are often enough absend in zip files.

@kientzle
Copy link
Contributor

kientzle commented Dec 15, 2018

As Joerg pointed out, there are basic limitations with some of the formats we deal with:

  • Tar files are always read in a streaming fashion, so always work the same way. If you need to work with streaming archives a lot, tar format is a good choice.
  • Zip files store file metadata in two different ways: partial metadata is stored with each entry; full metadata is stored at the end of the archive. Libarchive's Zip reader will seek to obtain full metadata if it can; otherwise it will use the partial metadata.
  • ISO allows file attributes to be stored before or after the entry. Libarchive's ISO reader will seek to obtain out-of-order metadata if it can; otherwise it will fail.

As a workaround, libarchive's Zip support includes an experimental extension (developed in conjunction with the Info-Zip maintainers) that puts more complete metadata with each entry. I hope to enable this by default at some point.

In theory, the streaming Zip reader could read the full metadata when it does get to the end and update all the files. This would require some careful rework of the Zip reader and probably changes to the logic that writes files to disk. In essence, every file would get "written to disk" twice: Once with full data and partial metadata, again with full metadata and no data.

@kientzle
Copy link
Contributor

Is there any standard to ZIP which information (per-entry header or central directory) is more to trust?

The Zip standard is here:
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

If you study this carefully, you'll notice that the file permissions are only stored in the central directory. All other metadata should be the same. The zip:experimental adds an extension to the per-entry header which duplicates the file permissions that are present in the central directory.

@epicfaace
Copy link

  • Libarchive's Zip reader will seek to obtain full metadata if it can; otherwise it will use the partial metadata.

@kientzle quick question -- when libarchive is streaming a .zip file and just using the partial metadata, how does it deal with the possibility mentioned here that some files could not be actually listed in the central directory and thus should not be extracted, as well as the possibility that there is extra data between file chunks / before the first file chunk? Does it just assume that the zip file isn't in these special cases, or does it try to read the central directory at the end to somehow correct what has already been extracted?

@kientzle
Copy link
Contributor

kientzle commented Jun 1, 2021

In theory, libarchive could stream Zip archives by extracting all the entries, then reading the central directory and using that information to edit the data on disk. It does not currently do this. As a result, it cannot fully handle some of the pathological cases you describe while performing a streaming extraction.

Libarchive does have error-recovery logic that can to a limited extent deal with garbage data appearing in the archive (between entries or before the first entry). You can see the details starting around line 3146 of the read_header function here:

archive_read_format_zip_streamable_read_header(struct archive_read *a,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants