Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory.EnumerateFiles on Linux recurses infinitely with bad symlinks #97123

Closed
Deterous opened this issue Jan 18, 2024 · 5 comments
Closed
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors os-linux Linux OS (any supported distro)
Milestone

Comments

@Deterous
Copy link

Deterous commented Jan 18, 2024

Description

On Linux, Directory.GetFiles(path, "*", SearchOption.AllDirectories) and Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories) both recurse infinitely when an unreadable symlink (File -> '') is present, seemingly seeing '' as the current directory and recursing into itself. This was confirmed by limiting the recursion with EnumerateOptions to 9, and EnumerateFiles saw the symlink as a folder and recursed downwards 9 times.
Windows dotnet does not follow the bad symlink and does not recurse infinitely.

Reproduction Steps

Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories) where path is a path to a folder containing an unreadable symlink to ''

Expected behavior

Linux builds should follow Windows builds and not follow the bad symlink.

Actual behavior

Hangs indefinitely on Linux (although I assume it actually recurses IntMax times)

Regression?

No response

Known Workarounds

No response

Configuration

net8.0, linux-x64

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 18, 2024
@ghost
Copy link

ghost commented Jan 18, 2024

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

On Linux, Directory.GetFiles(path, "*", SearchOption.AllDirectories) and Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories) both recurse infinitely when an unreadable symlink (File -> '') is present, seemingly seeing '' as the current directory and recursing into itself. This was confirmed by limiting the recursion with EnumerateOptions, and EnumerateFiles saw the symlink as a folder and recursed downwards 9 times.
Windows dotnet does not follow the bad symlink and does not recurse infinitely.

Reproduction Steps

Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories) where path is a path to a folder containing an unreadable symlink to ''

Expected behavior

Linux builds should follow Windows builds and not follow the bad symlink.

Actual behavior

Hangs indefinitely (although I assume it actually recurses IntMax times)

Regression?

No response

Known Workarounds

No response

Configuration

net8.0, linux-x64

Other information

No response

Author: Deterous
Assignees: -
Labels:

area-System.IO, untriaged

Milestone: -

@adamsitnik adamsitnik added this to the 9.0.0 milestone Jan 19, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 19, 2024
@adamsitnik adamsitnik added os-linux Linux OS (any supported distro) help wanted [up-for-grabs] Good issue for external contributors labels Jan 19, 2024
@tmds
Copy link
Member

tmds commented Jan 25, 2024

Linux doesn't allow to create links that have an empty target path.

I think we may be seeing the expected behavior of the kernel when there is such a link in a filesystem.

I assume the empty path is treated the same as a link where the target path is ., that is: it points back to the directory.

As we're recursing into the same directory again through the link, eventually stat may/should return ELOOP (at about 40 times) because we're traversing the link so many times.
Then that file should no longer be considered a directory (but a file), and the recursion should stop.

If you use FileSystemEnumerable (which these methods use under the hood), you can see how the traversal goes.
That will give you some visibility into the behavior.

var fse = new FileSystemEnumerable<string>(
    directory,
    transform: (ref FileSystemEntry entry) => entry.ToFullPath(),
    options: new EnumerationOptions()
    {
        RecurseSubdirectories = true
    });

foreach (string path in fse)
{
    Console.WriteLine(path);
}

@tmds
Copy link
Member

tmds commented Jan 26, 2024

@Deterous can you run the above code on your directory?

I imagine it may do something similar to the following, where the link target is ..

using System.IO.Enumeration;

string directory = Path.Combine(Path.GetTempPath(), Path.GetRandomFileName());
Directory.CreateDirectory(directory);
Directory.CreateSymbolicLink(Path.Combine(directory, "self"), ".");

var fse = new FileSystemEnumerable<string>(
    directory,
    transform: (ref FileSystemEntry entry) => entry.ToFullPath(),
    options: new EnumerationOptions()
    {
        RecurseSubdirectories = true
    });

foreach (string path in fse)
{
    Console.WriteLine(path);
}

The output of this program on my system is:

/tmp/wv23ssgs.k3r/self
/tmp/wv23ssgs.k3r/self/self
...
/tmp/wv23ssgs.k3r/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self
/tmp/wv23ssgs.k3r/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self/self

And then it does stop (due to the ELOOP).

Notice that the paths get longer. That's because .NET doesn't do anything that relies on the link target path, but it just appends the link name to the current directory path.

It would be an issue if .NET were using the link target, and that would cause the same path to show up here continuously. That is not the case.

@Deterous
Copy link
Author

Deterous commented Jan 29, 2024

@tmds Thank you, I also assumed the empty path was treated as current directory.

Your output from is similar to the output I get, except in my case there are three files that are symlinked to '' so the output path is some combination of the three files as it traverses down. This, of course, takes a very long time as I assume it can perform any combination of 1-40 file1's, 1-40 file2's, and 1-40 file3's? I did not wait for it to print all possible permutations, and even trying the above example but with an added Directory.CreateSymbolicLink(Path.Combine(directory, "self2"), "."); adds a stupid complexity, let alone 3 or more.

For context, these files exist on a UDF filesystem on a DVD-ROM, I did not create these symlinks nor do I know how they were created.

Is this then "expected" dotnet behaviour? Can anything be done to prevent this unnecessary recursion?

@tmds
Copy link
Member

tmds commented Jan 29, 2024

Can anything be done to prevent this unnecessary recursion?

For some use-cases you don't actually want to follow links.

There is an API proposal for adding an option to control whether links should be followed: #52666.

You can implement something yourself using FileSystemEnumerable and setting the ShouldRecursePredicate.

To skip recursing into symlinks with ShouldRecursePredicate on Unix, you can check for FileAttributes.ReparsePoint.
On Windows, reparse points include several things, so you may need to consider if you want to be skipping all of those.

I'll close the issue as there is nothing to be changed on the .NET side.

@tmds tmds closed this as completed Jan 29, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

3 participants