Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent recursive directories in the tarballs #958

Open
giordano opened this issue Oct 25, 2020 · 4 comments
Open

Prevent recursive directories in the tarballs #958

giordano opened this issue Oct 25, 2020 · 4 comments

Comments

@giordano
Copy link
Member

giordano commented Oct 25, 2020

https://github.com/JuliaBinaryWrappers/Poppler_jll.jl/releases/download/Poppler-v0.87.0%2B1/Poppler.v0.87.0.x86_64-w64-mingw32-cxx11.tar.gz has a recursive bin/ directory. No idea what we can do, but it's probably a good to have a mechanism to avoid this.

Screenshot_20201025_161431

@StefanKarpinski
Copy link

StefanKarpinski commented Oct 25, 2020

There's logic in https://github.com/JuliaIO/Tar.jl/blob/853aa7f1c259a15b42c1ff09b016b3eb2e9e9db4/src/extract.jl#L89-L170 to correctly handle all cases; it could maybe be extracted or reused somehow. It's impressively tricky to detect all the ways that symlinks can create circularity or escape from the archive.

@giordano
Copy link
Member Author

For the record, a mechanism to detect recursion would be useful also for #952

@StefanKarpinski
Copy link

There are two basic kinds of circularity to worry about:

  1. Self-reference: a link points to itself directly or through a sequence of other links;
  2. Self-containment: a link points to a prefix of itself directly or indirectly.

You can capture both of these as:

A symlink is circular if it points to a prefix of itself (including itself) via a series of symlinks.

I worked with paths in Tar.jl because that's what I already had a tree of, but it's much easier if you build a tree representation first. So you'd want to build a tree where each entry is either a file, a directory or a symlink. If you split a symlink on slashes into a vector of parts, you can then resolve the symlink target by walking up and down the tree. Specifically: .. goes up, . stays at the same node if the node is a directory or is an error if the node is a file, name goes to a child. I think all of this can be captured with a single rule if you explicitly put . and .. entries in each directory and represent files as being empty (so . and .. lookups fail). That resolves symlinks to actually point at an actual node in a tree. Next, you need to resolve chains of symlinks being careful to detect loops. Finally, if the ultimate target of any symlink is not an ancestor of itself, then it should be kosher.

If you do all of that, then there's the problem of copy order, which is itself pretty tricky, but isn't necessary for just checking an archive. All this is complicated enough and hard enough to get right that it should maybe be made into a standalone utility package.

@StefanKarpinski
Copy link

Actually, I just came up with a new, previously-undetected way to create uncopiable symlinks 😂

.:
total 0
drwxr-xr-x 3 stefan staff 96 Oct 26 12:03 bar/
drwxr-xr-x 3 stefan staff 96 Oct 26 12:03 foo/

./bar:
total 0
lrwxr-xr-x 1 stefan staff 6 Oct 26 12:03 foo -> ../foo/

./foo:
total 0
lrwxr-xr-x 1 stefan staff 6 Oct 26 12:03 bar -> ../bar/

And indeed, in testing, this hangs Tar.extract with copy_symlinks = true (issue: JuliaIO/Tar.jl#77).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants