Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirEntry::path_is_symlink returns incorrect result for first result from ignore iterators #984

Closed
ssokolow opened this Issue Jul 17, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@ssokolow
Copy link

ssokolow commented Jul 17, 2018

What version of ripgrep are you using?

I'm using the ignore crate, version 0.4.2, but you don't seem to have a separate repository for it.

How did you install ripgrep?

I added ignore = "0.4" to my Cargo.toml

What operating system are you using ripgrep on?

ssokolow@monolith ignore_test [master] % lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty

Describe your question, feature request, or bug.

Iterators from the ignore crate incorrectly identify the argument passed to them as a symlink when they return it during the first iteration.

This is actually quite the annoyance for a project I've just started, because it means I'm going to need to design some kind of minimally ugly hack to do a second syscall to get the actual value but only on the first iteration to avoid needlessly bogging down the following iterations.

If this is a bug, what are the steps to reproduce the behavior?

use std::fs::metadata;

extern crate ignore;
use ignore::Walk;

fn main() {
    for result in Walk::new("./") {
        if let Ok(entry) = result {
            let verification = metadata(entry.path()).unwrap().file_type().is_symlink();
            println!("{}: {}", entry.path_is_symlink() == verification, entry.path().display());
        }
    }
}

If this is a bug, what is the actual behavior?

ssokolow@monolith ignore_test [master] % cargo run
   Compiling ignore_test v0.1.0 (file:///home/ssokolow//src/ignore_test)
    Finished dev [unoptimized + debuginfo] target(s) in 4.72 secs
     Running `target/debug/ignore_test`
false: ./
true: ./Cargo.lock
true: ./src
true: ./src/main.rs
true: ./target
true: ./target/debug
[output truncated for brevity as the rest is irrelevant]

(Note that the first entry is false, indicating that the result returned by ignore's path_is_symlink doesn't match the result returned by explicitly querying the filesystem.)

If this is a bug, what is the expected behavior?

ssokolow@monolith ignore_test [master] % cargo run
   Compiling ignore_test v0.1.0 (file:///home/ssokolow//src/ignore_test)
    Finished dev [unoptimized + debuginfo] target(s) in 4.72 secs
     Running `target/debug/ignore_test`
true: ./
true: ./Cargo.lock
true: ./src
true: ./src/main.rs
true: ./target
true: ./target/debug
[output truncated for brevity as the rest is irrelevant]

(Note that the first entry is true as well)

@BurntSushi

This comment has been minimized.

Copy link
Owner

BurntSushi commented Jul 17, 2018

Nice bug. This bug is actually present in walkdir in addition to the parallel walker in ignore as well. (I believe that's a total of two distinct bugs, since the single threaded ignore walker mostly just defers to walkdir internally.)

Example, riffing off of yours:

extern crate ignore;
extern crate walkdir;

use std::fs::metadata;

use ignore::{Walk, WalkBuilder, WalkState};
use walkdir::WalkDir;

fn main() {
    for result in WalkDir::new("./") {
        if let Ok(entry) = result {
            let verification = metadata(entry.path())
                .unwrap()
                .file_type()
                .is_symlink();
            println!(
                "{}: {}",
                entry.path_is_symlink() == verification,
                entry.path().display(),
            );
        }
    }

    println!("\n----------------------------------------------\n");

    for result in Walk::new("./") {
        if let Ok(entry) = result {
            let verification = metadata(entry.path())
                .unwrap()
                .file_type()
                .is_symlink();
            println!(
                "{}: {}",
                entry.path_is_symlink() == verification,
                entry.path().display(),
            );
        }
    }

    println!("\n----------------------------------------------\n");

    let par = WalkBuilder::new("./").build_parallel();
    par.run(|| {
        Box::new(|result| {
            if let Ok(entry) = result {
                let verification = metadata(entry.path())
                    .unwrap()
                    .file_type()
                    .is_symlink();
                println!(
                    "{}: {}",
                    entry.path_is_symlink() == verification,
                    entry.path().display(),
                );
            }
            WalkState::Continue
        })
    });
}

The above shows the same bug in all three instances.

@ssokolow

This comment has been minimized.

Copy link
Author

ssokolow commented Jul 17, 2018

*chuckle* I seem to have been finding quite a few good bugs in the project I just started.

For example, only an hour or two after I made a note to come back to that, I got serde_json to panic because it turns out I have a file on one of my ext3 partitions with a POSIX timestamp of -1.

@BurntSushi

This comment has been minimized.

Copy link
Owner

BurntSushi commented Jul 17, 2018

This shouldn't be too hard to fix. It looks like the bug is that the initial path is turned into an entry using from_link, which always sets follow_link to true, which in turn implies that path_is_symlink always returns true.

My guess is that I erroneously used from_link for the initial path because the initial path doesn't have a DirEntry. But this of course shouldn't always cause it to be interpreted as a symlink.

Investigating a bit more, it looks like I introduced this bug in commit BurntSushi/walkdir@3732789, which appears motivated by the notion that the path given should always be followed, even if it's a symlink, which seems fine to me.

I think the question I have is why follow_link is used at all in the implementation of path_is_symlink. I'll have to noodle on that one.

And yeah, nice serde bug. :-)

@BurntSushi BurntSushi added the bug label Jul 17, 2018

BurntSushi added a commit that referenced this issue Aug 22, 2018

ignore: fix false positive in path_is_symlink
This commit fixes a bug where the first path always reported itself as
as symlink via `path_is_symlink`.

Part of this fix includes updating walkdir to 2.2.1, which also includes
a corresponding bug fix.

Fixes #984

BurntSushi added a commit that referenced this issue Aug 22, 2018

ignore: fix false positive in path_is_symlink
This commit fixes a bug where the first path always reported itself as
as symlink via `path_is_symlink`.

Part of this fix includes updating walkdir to 2.2.1, which also includes
a corresponding bug fix.

Fixes #984

BurntSushi added a commit that referenced this issue Aug 22, 2018

ignore: fix false positive in path_is_symlink
This commit fixes a bug where the first path always reported itself as
as symlink via `path_is_symlink`.

Part of this fix includes updating walkdir to 2.2.1, which also includes
a corresponding bug fix.

Fixes #984

BurntSushi added a commit that referenced this issue Aug 22, 2018

ignore: fix false positive in path_is_symlink
This commit fixes a bug where the first path always reported itself as
as symlink via `path_is_symlink`.

Part of this fix includes updating walkdir to 2.2.1, which also includes
a corresponding bug fix.

Fixes #984

BurntSushi added a commit to BurntSushi/walkdir that referenced this issue Nov 11, 2018

walkdir: fix root symlink bug
This commit fixes a nasty bug where the root path given to walkdir was
always reported as a symlink, even when 'follow_links' was enabled. This
appears to be a regression introduced by commit 6f72fce as part of
fixing BurntSushi/ripgrep#984.

The central problem was that since root paths should always be followed,
we were creating a DirEntry whose internal file type was always resolved
by following a symlink, but whose 'metadata' method still returned the
metadata of the symlink and not the target. This was problematic and
inconsistent both with and without 'follow_links' enabled.

We also fix the documentation. In particular, we make the docs of 'new'
more unambiguous, where it previously could have been interpreted as
contradictory to the docs on 'DirEntry'. Specifically, 'WalkDir::new'
says:

    If root is a symlink, then it is always followed.

But the docs for 'DirEntry::metadata' say

    This always calls std::fs::symlink_metadata.

    If this entry is a symbolic link and follow_links is enabled, then
    std::fs::metadata is called instead.

Similarly, 'DirEntry::file_type' said

    If this is a symbolic link and follow_links is true, then this
    returns the type of the target.

That is, if 'root' is a symlink and 'follow_links' is NOT enabled,
then the previous incorrect behavior resulted in 'DirEntry::file_type'
behaving as if 'follow_links' was enabled. If 'follow_links'
was enabled, then the previous incorrect behavior resulted in
'DirEntry::metadata' reporting the metadata of the symlink itself.

We fix this by correctly constructing the DirEntry in the first place,
and then adding special case logic to path traversal that will always
attempt to follow the root path if it's a symlink and 'follow_links'
was not enabled. We also tweak the docs on 'WalkDir::new' to be more
precise.

Fixes #115
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.