Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

another exception when a tarfile contains a path called '.' #284

Closed
mrg0029 opened this issue May 10, 2019 · 9 comments · Fixed by #291
Closed

another exception when a tarfile contains a path called '.' #284

mrg0029 opened this issue May 10, 2019 · 9 comments · Fixed by #291
Assignees
Labels

Comments

@mrg0029
Copy link
Contributor

mrg0029 commented May 10, 2019

I was running into this problem until the 2.4.5 release.
I no longer get an IndexError, but I now get a ResourceNotFound when testing the same file.
In Debian/Ubuntu/etc. this should reproduce it:

$ pushd $(mktemp -d)
$ apt-get download hello
$ ar t hello*
    debian-binary
    control.tar.gz
    data.tar.xz
$ ar x hello*
$ python -c 'import tarfile; tarfile.open("data.tar.xz").getmembers(); print("OK")'
    OK

$ python -c 'import fs; fs.open_fs("tar://data.tar.xz").walk.files().__next__(); print("OK")'
    Traceback (most recent call last):
      File "site-packages/fs/tarfs.py", line 330, in getinfo
        member = self._tar.getmember(self._encode(_path))
      File "tarfile.py", line 1752, in getmember
        raise KeyError("filename %r not found" % name)
    KeyError: "filename 'usr' not found"

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "site-packages/fs/walk.py", line 362, in files
        for _path, info in self._iter_walk(fs, path=path):
      File "site-packages/fs/walk.py", line 433, in _walk_breadth
        for info in _scan(fs, dir_path, namespaces=namespaces):
      File "site-packages/fs/walk.py", line 298, in _scan
        six.reraise(type(error), error)
      File "site-packages/six.py", line 692, in reraise
        raise value.with_traceback(tb)
    fs.errors.ResourceNotFound: resource '/usr' not found

I don't get the same error when creating a file manually.

$ pushd $(mktemp -d)
$ touch a b c
$ tar --create      -f stuff1.tar    a b c
$ tar --create --xz -f stuff2.tar.xz a b c
$ python -c 'import fs; fs.open_fs("tar://stuff1.tar").walk.files().__next__(); print("OK")'
    OK
$ python -c 'import fs; fs.open_fs("tar://stuff2.tar.xz").walk.files().__next__(); print("OK")'
    OK

Edit: fixed a couple of the commands per lurch's reply

@lurch
Copy link
Contributor

lurch commented May 10, 2019

tar tvf data.tar.gz shows:

drwxr-xr-x root/root         0 2013-08-26 17:25 ./
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/bin/
-rwxr-xr-x root/root     22984 2013-08-26 17:25 ./usr/bin/hello
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/share/
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/share/info/
-rw-r--r-- root/root     11729 2013-08-26 17:25 ./usr/share/info/hello.info.gz
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/share/doc/
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/share/doc/hello/
-rw-r--r-- root/root      3108 2012-04-19 14:51 ./usr/share/doc/hello/NEWS
-rw-r--r-- root/root      2251 2012-05-27 21:07 ./usr/share/doc/hello/copyright
-rw-r--r-- root/root      1180 2013-08-26 17:25 ./usr/share/doc/hello/changelog.Debian.gz
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/share/man/
drwxr-xr-x root/root         0 2013-08-26 17:25 ./usr/share/man/man1/
-rw-r--r-- root/root       729 2013-08-26 17:25 ./usr/share/man/man1/hello.1.gz

Looking at the change in #281 it seems the paths in the tar archive were previously 'translated' with relpath(self._decode(info.name)).rstrip("/") but are now 'translated' (and cached) with self._decode(info.name).strip("/"), however there was no matching change to the code that reads the directory paths; so I'm guessing that there's some mismatch between different parts of the code expecting ./usr or /usr or usr ?
Shouldn't be too hard for @willmcgugan to create a test-case and fix for this 😉 (or I might have a go myself over the weekend if I'm not busy)

P.S. I assume your pushd (mktemp -d) was supposed to be pushd $(mktemp -d) ? And on Ubuntu 14.04 apt download ... doesn't work, but apt-get download ... does 🙂

@lurch
Copy link
Contributor

lurch commented May 12, 2019

Okay, I've done some poking around, and AFAICT this patch seems to fix things, and make test still passes with no errors.

diff --git a/fs/tarfs.py b/fs/tarfs.py
index fcd7ceb..c89e141 100644
--- a/fs/tarfs.py
+++ b/fs/tarfs.py
@@ -279,9 +279,9 @@ class ReadTarFS(FS):
         """Lazy directory cache."""
         if self._directory_cache is None:
             _decode = self._decode
-            _directory = ((_decode(info.name).strip("/"), info) for info in self._tar)
+            _directory = ((normpath(_decode(info.name).strip("/")), info) for info in self._tar)
             self._directory_cache = OrderedDict(
-                (name, info) for name, info in _directory if normpath(name)
+                (name, info) for name, info in _directory if name
             )
         return self._directory_cache
 
@@ -327,7 +327,7 @@ class ReadTarFS(FS):
         else:
             try:
                 implicit = False
-                member = self._tar.getmember(self._encode(_path))
+                member = self._directory[_path]
             except KeyError:
                 if not self.isdir(_path):
                     raise errors.ResourceNotFound(path)

It also potentially makes getinfo faster as I assume self._directory[_path] will always be faster than self._tar.getmember(self._encode(_path)) ? However if the fix is really that easy it makes me wonder why the code wasn't doing that in the first place?

@mrg0029 Please run your own tests and let me know if the patch above fixes things for you?

@willmcgugan I'm afraid I don't have the time to turn this patch into a "proper" PR, with unit-tests etc. so feel free to do that part yourself 😉

@lurch
Copy link
Contributor

lurch commented May 12, 2019

Ping @althonos as it looks like he did most of the tarfs work.

@mrg0029
Copy link
Contributor Author

mrg0029 commented May 13, 2019

Your change fixes the project I was working on (and also fixes the example in my original post).

I couldn't get the test suite to pass locally. Looks like that's unrelated to your change though; nosetests fails on the last tagged release too. _delay_file_utime() (in tests/test_copy.py) acts funny when run from the time zone I'm in. But that's something for a separate issue.

mrg0029 pushed a commit to mrg0029/pyfilesystem2 that referenced this issue May 15, 2019
@willmcgugan
Copy link
Member

sigh Starting to dislike tar files. Thanks for doing the initial sluething @lurch That likely is the fix.

Something that concerns me regarding tar files the name (ie path) of a file may contain a relative path, which points outside of the root. e.g. you could have a path which references "../../foo.txt" which if trusted would uncompress a file to a directory above where you want it to be.

I tested that on MacOS and if trying to open it via the UI it fails with an "Operation not permitted" error.

If you open such a file with TarFS, it appears to work, but just about every operation results in a IllegalBackReference error. Which is fair enough, but it I'm wondering if this is the desired behavior, or a more explicit error would be better.

I may be overthinking this. I'll stew on it for a bit.

@willmcgugan willmcgugan self-assigned this Jun 2, 2019
@willmcgugan willmcgugan mentioned this issue Jun 2, 2019
9 tasks
@lurch
Copy link
Contributor

lurch commented Jun 3, 2019

Starting to dislike tar files.

Yah, IIRC the number of inconsistencies and edge-cases with tarfiles was what stopped me from finishing my TarFS code for the original pyfilesystem (back in the days when we were still using googlegroups and googlecode!) 😉

@mrg0029
Copy link
Contributor Author

mrg0029 commented Jun 7, 2019

I spoke too soon when I said my deb file example was fixed.

Tried fixing it myself; no luck so far.

This example should be easier to reproduce on MacOS. Both of these files contain paths with leading /s. The first one works fine. In the second, .getbasic() works but .gettext() fails.

wget http://ftp.debian.org/debian/pool/main/h/hello/hello_2.10-2.debian.tar.xz -O hello.tar.xz
wget http://ftp.debian.org/debian/pool/main/h/hello/hello_2.10-2_amd64.deb     -O hello.deb
ar x hello.deb data.tar.xz
>>> import fs

>>> a = fs.open_fs("tar://hello.tar.xz")
>>> a.getbasic('/debian/changelog')
<file 'changelog'>
>>> a.gettext('/debian/changelog')
# (it works)

>>> b = fs.open_fs("tar://data.tar.xz")
>>> b.getbasic('/usr/share/doc/hello/copyright')
<file 'copyright'>
>>> b.gettext('/usr/share/doc/hello/copyright')
ResourceNotFound: resource '/usr/share/doc/hello/copyright' not found

(these are Debian's files for GNU's Hello World program, by the way)

@lurch
Copy link
Contributor

lurch commented Jun 7, 2019

Looking at the tar -tvf output, it seems as though hello.tar.xz is storing filenames like debian/changelog, whereas data.tar.xz is storing filenames like ./usr/share/doc/hello/copyright (although I dunno why that makes getbasic() and gettext() work differently).

I might take a more in-dpeth look over the weekend, if a) I remember and b) @willmcgugan doesn't beat me to it 😉

EDIT: #291 contains a more complete patch than the one I posted above - try using that if you haven't already!

@willmcgugan
Copy link
Member

@mrg0029 Should be fixed in v2.4.6

Thanks @lurch for the assist

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 13, 2019
[2.4.11]:
Added
Added geturl for TarFS and ZipFS for 'fs' purpose. NoURL for 'download' purpose.
Added helpful root path in CreateFailed exception
Added Python 3.8 support

Fixed
Fixed tests leaving tmp files
Fixed typing issues
Fixed link namespace returning bytes
Fixed broken FSURL in windows
Fixed hidden exception at fs.close() when opening an absent zip/tar file URL
Fixed abstract class import from collections which would break on Python 3.8
Fixed incorrect imports of mock on Python 3
Removed some unused imports and unused requirements.txt file
Added mypy checks to Travis.
Fixed missing errno.ENOTSUP on PyPy.
Fixed bug in a decorator that would trigger an AttributeError when a class was created that implemented a deprecated method and had no docstring of its own.

Changed
Entire test suite has been migrated to pytest.
Style checking is now enforced using flake8; this involved some code cleanup such as removing unused imports.

[2.4.10]:
Fixed
Fixed broken WrapFS.movedir

[2.4.9]:
Fixed
Restored fs.path import
Fixed potential race condition in makedirs.
Added missing methods to WrapFS.

Changed
MemFS now immediately releases all memory it holds when close() is called, rather than when it gets garbage collected.
FTPFS now translates EOFError into RemoteConnectionError.
Added automatic close for filesystems that go out of scope.

[2.4.8]:
Changed
geturl will return URL with user/password if needed @zmej-serow

[2.4.7]:
Added
Flag to OSFS to disable env var expansion

[2.4.6]:
Added
Implemented geturl in FTPFS @zmej-serow

Fixed
Fixed FTP test suite when time is not UTC-0 @mrg0029
Fixed issues with paths in tarfs PyFilesystem/pyfilesystem2#284

Changed
Dropped Python3.3 support

[2.4.5]:
Fixed
Restored deprecated setfile method with deprecation warning to change to writefile
Fixed exception when a tarfile contains a path called '.' PyFilesystem/pyfilesystem2#275
Made TarFS directory loading lazy

Changed
Detect case insensitivity using by writing temp file

[2.4.4]:
Fixed
OSFS fail in nfs mounts

[2.4.3]:
Fixed
Fixed broken "case_insensitive" check
Fixed Windows test fails

[2.4.2]:
Fixed
Fixed exception when Python runs with -OO

[2.4.1]:
Fixed
Fixed hash method missing from WrapFS

[2.4.0]:
Added
Added exclude and filter_dirs arguments to walk

Micro-optimizations to walk
[2.3.1]:
Fixed
Add encoding check in OSFS.validatepath

[2.3.0]:
Fixed
IllegalBackReference had mangled error message

Added
FS.hash method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants