Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileInfo/DirectoryInfo: enable transparent handling of symbolic links #52908

Open
Tracked by #57205 ...
tmds opened this issue May 18, 2021 · 34 comments
Open
Tracked by #57205 ...

FileInfo/DirectoryInfo: enable transparent handling of symbolic links #52908

tmds opened this issue May 18, 2021 · 34 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@tmds
Copy link
Member

tmds commented May 18, 2021

Background and Motivation

Users may not be aware that the fileName/path they are providing is a symbolic link.

This proposal makes it possible to operate on such path without the user having to manually handle symbolic links using the APIs added in #24271.

Proposed API

A new constructor on FileInfo allows to specify whether the instance should return information about the target instead of the symbolic link.

class FileInfo
{
   public FileInfo(string fileName, bool followLinks) { }
}
class DirectoryInfo
{
    public DirectoryInfo(string path, bool followLinks) { }
}

The argument causes information for the link target to be returned for the following properties. When there is no target they throw no such file.

class FileInfo
{
    public bool IsReadOnly { get { throw null; } set { } }
    public long Length { get { throw null; } }
}
class FileSystemInfo
{
    public FileAttributes Attributes { get { throw null; } set { } }
    public DateTime CreationTime { get { throw null; } set { } }
    public DateTime CreationTimeUtc { get { throw null; } set { } }
    public abstract bool Exists { get; }
    public DateTime LastAccessTime { get { throw null; } set { } }
    public DateTime LastAccessTimeUtc { get { throw null; } set { } }
    public DateTime LastWriteTime { get { throw null; } set { } }
    public DateTime LastWriteTimeUtc { get { throw null; } set { } }

    // from https://github.com/dotnet/runtime/issues/24271
    // always returns `false` when `followLinks` is `true`.
    public bool IsSymbolicLink { get; } 
}

It does not affect the following operations. Which are either:

  • always working on the target, or
  • working on the 'fileName'/'path' argument.
class DirectoryInfo
{
    // applies to 'path':
    public DirectoryInfo? Parent { get { throw null; } }
    public DirectoryInfo Root { get { throw null; } }

    // applies to 'target':
    public void Create() { }
    public DirectoryInfo CreateSubdirectory(string path) { throw null; }
    public IEnumerable<DirectoryInfo> EnumerateDirectories() { throw null; }
    public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern) { throw null; }
    public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern, SearchOption searchOption) { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles() { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles(string searchPattern) { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles(string searchPattern, SearchOption searchOption) { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos() { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern) { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern, SearchOption searchOption) { throw null; }
    public DirectoryInfo[] GetDirectories() { throw null; }
    public DirectoryInfo[] GetDirectories(string searchPattern) { throw null; }
    public DirectoryInfo[] GetDirectories(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public DirectoryInfo[] GetDirectories(string searchPattern, SearchOption searchOption) { throw null; }
    public FileInfo[] GetFiles() { throw null; }
    public FileInfo[] GetFiles(string searchPattern) { throw null; }
    public FileInfo[] GetFiles(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public FileInfo[] GetFiles(string searchPattern, SearchOption searchOption) { throw null; }
    public FileSystemInfo[] GetFileSystemInfos() { throw null; }
    public FileSystemInfo[] GetFileSystemInfos(string searchPattern) { throw null; }
    public FileSystemInfo[] GetFileSystemInfos(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public FileSystemInfo[] GetFileSystemInfos(string searchPattern, SearchOption searchOption) { throw null; }

    // applies to 'path':
    public void Delete(bool recursive) { }
    public override void Delete() { }
    public void MoveTo(string destDirName) { }
}
class FileInfo
{
    // applies to 'fileName'
    public DirectoryInfo? Directory { get { throw null; } }
    public string? DirectoryName { get { throw null; } }

    // applies to 'target':
    public StreamWriter AppendText() { throw null; }
    public FileInfo CopyTo(string destFileName) { throw null; }
    public FileInfo CopyTo(string destFileName, bool overwrite) { throw null; }
    public FileStream Create() { throw null; }
    public StreamWriter CreateText() { throw null; }
    public void Decrypt() { }
    public void Encrypt() { }
    public FileStream Open(FileMode mode) { throw null; }
    public FileStream Open(FileMode mode, FileAccess access) { throw null; }
    public FileStream Open(FileMode mode, FileAccess access, FileShare share) { throw null; }
    public FileStream OpenRead() { throw null; }
    public StreamReader OpenText() { throw null; }
    public FileStream OpenWrite() { throw null; }
    public FileInfo Replace(string destinationFileName, string? destinationBackupFileName) { throw null; }
    public FileInfo Replace(string destinationFileName, string? destinationBackupFileName, bool ignoreMetadataErrors) { throw null; }

    // applies to 'path':
    public override void Delete() { }
    public void MoveTo(string destFileName) { }
    public void MoveTo(string destFileName, bool overwrite) { }
}
class FileSystemInfo
{
    // applies to 'fileName'/'path':
    protected string FullPath;
    protected string OriginalPath;
    public string Extension { get { throw null; } }
    public virtual string FullName { get { throw null; } }
    public abstract string Name { get; }
}

Usage Examples

FileInfo file = new FileInfo("/tmp/my-file", followLinks: true);
DateTime lastWriteTime = file.LastWriteTimeUtc;
while (true)
{
    Thread.Sleep(1000);
    file.Refresh();
    DateTime writeTime = file.LastWriteTimeUtc;
    if (writeTime != lastWriteTime)
    {
        Console.WriteLine("The file has changed");
        lastWriteTime = writeTime;
    }
}

Implementation

On Linux, this means the information for the properties is retrieved using stat instead of vstat.
For Delete, the final target is first located, e.g. by calling realpath, and that is then deleted.

edits:

  • added IsSymbolicLink from Proposed API for symbolic links #24271
  • renamed followLink to followLinks.
  • updated proposal to make Delete target based.
  • added implementation section which covers Linux
  • revert Delete to be path based again.
  • place MoveTo under path-based
@tmds tmds added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 18, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.IO untriaged New issue has not been triaged by the area owner labels May 18, 2021
@ghost
Copy link

ghost commented May 18, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

Users may not be aware that the fileName/path they are providing is a symbolic link.

This proposal makes it possible to operate on such path without the user having to manually handle symbolic links using the APIs added in #24271.

Proposed API

A new constructor on FileInfo allows to specify whether the instance should return information about the target instead of the symbolic link.

class FileInfo
{
   public FileInfo(string fileName, bool followLink) { }
}
class DirectoryInfo
{
    public DirectoryInfo(string path, bool followLink) { }
}

The argument causes information for the link target to be returned for the following properties:

class FileInfo
{
    public bool IsReadOnly { get { throw null; } set { } }
    public long Length { get { throw null; } }
}
class FileSystemInfo
{
    public FileAttributes Attributes { get { throw null; } set { } }
    public DateTime CreationTime { get { throw null; } set { } }
    public DateTime CreationTimeUtc { get { throw null; } set { } }
    public abstract bool Exists { get; }
    public DateTime LastAccessTime { get { throw null; } set { } }
    public DateTime LastAccessTimeUtc { get { throw null; } set { } }
    public DateTime LastWriteTime { get { throw null; } set { } }
    public DateTime LastWriteTimeUtc { get { throw null; } set { } }
}

It does not affect the following operations. Which are either:

  • always working on the target, or
  • working on the 'fileName'/'path' argument.
class DirectoryInfo
{
    // applies to 'path':
    public DirectoryInfo? Parent { get { throw null; } }
    public DirectoryInfo Root { get { throw null; } }

    // applies to 'target':
    public void Create() { }
    public DirectoryInfo CreateSubdirectory(string path) { throw null; }
    public void Delete(bool recursive) { }
    public IEnumerable<DirectoryInfo> EnumerateDirectories() { throw null; }
    public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern) { throw null; }
    public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public IEnumerable<DirectoryInfo> EnumerateDirectories(string searchPattern, SearchOption searchOption) { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles() { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles(string searchPattern) { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public IEnumerable<FileInfo> EnumerateFiles(string searchPattern, SearchOption searchOption) { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos() { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern) { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(string searchPattern, SearchOption searchOption) { throw null; }
    public DirectoryInfo[] GetDirectories() { throw null; }
    public DirectoryInfo[] GetDirectories(string searchPattern) { throw null; }
    public DirectoryInfo[] GetDirectories(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public DirectoryInfo[] GetDirectories(string searchPattern, SearchOption searchOption) { throw null; }
    public FileInfo[] GetFiles() { throw null; }
    public FileInfo[] GetFiles(string searchPattern) { throw null; }
    public FileInfo[] GetFiles(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public FileInfo[] GetFiles(string searchPattern, SearchOption searchOption) { throw null; }
    public FileSystemInfo[] GetFileSystemInfos() { throw null; }
    public FileSystemInfo[] GetFileSystemInfos(string searchPattern) { throw null; }
    public FileSystemInfo[] GetFileSystemInfos(string searchPattern, EnumerationOptions enumerationOptions) { throw null; }
    public FileSystemInfo[] GetFileSystemInfos(string searchPattern, SearchOption searchOption) { throw null; }
    public void MoveTo(string destDirName) { }

    // applies to 'path':
    public override void Delete() { }
}
class FileInfo
{
    // applies to 'fileName'
    public DirectoryInfo? Directory { get { throw null; } }
    public string? DirectoryName { get { throw null; } }

    // applies to 'target':
    public StreamWriter AppendText() { throw null; }
    public FileInfo CopyTo(string destFileName) { throw null; }
    public FileInfo CopyTo(string destFileName, bool overwrite) { throw null; }
    public FileStream Create() { throw null; }
    public StreamWriter CreateText() { throw null; }
    public void Decrypt() { }
    public void Encrypt() { }
    public void MoveTo(string destFileName) { }
    public void MoveTo(string destFileName, bool overwrite) { }
    public FileStream Open(FileMode mode) { throw null; }
    public FileStream Open(FileMode mode, FileAccess access) { throw null; }
    public FileStream Open(FileMode mode, FileAccess access, FileShare share) { throw null; }
    public FileStream OpenRead() { throw null; }
    public StreamReader OpenText() { throw null; }
    public FileStream OpenWrite() { throw null; }
    public FileInfo Replace(string destinationFileName, string? destinationBackupFileName) { throw null; }
    public FileInfo Replace(string destinationFileName, string? destinationBackupFileName, bool ignoreMetadataErrors) { throw null; }

    // applies to 'fileName':
    public override void Delete() { }
}
class FileSystemInfo
{
    // applies to 'fileName'/'path':
    protected string FullPath;
    protected string OriginalPath;
    public string Extension { get { throw null; } }
    public virtual string FullName { get { throw null; } }
    public abstract string Name { get; }
}

Usage Examples

FileInfo file = new FileInfo("/tmp/my-file", followLink: true);
DateTime lastWriteTime = file.LastWriteTimeUtc;
while (true)
{
    Thread.Sleep(1000);
    file.Refresh();
    DateTime writeTime = file.LastWriteTimeUtc;
    if (writeTime != lastWriteTime)
    {
        Console.WriteLine("The file has changed");
        lastWriteTime = writeTime;
    }
}
Author: tmds
Assignees: -
Labels:

api-suggestion, area-System.IO, untriaged

Milestone: -

@mklement0

This comment has been minimized.

@tmds
Copy link
Member Author

tmds commented May 19, 2021

I think it is problematic to create hybrid FileSystemInfo instances that combine properties of the link (.Name, .FullName, ...) with properties of its target.

I don't think this is problematic. I think it is desired to not expose a symlink was involved.
The path that is passed to the constructor is treated as an invariant.

What happens if the target doesn't exist?

No such file exception.

What is the implied recursion mode of followLink: true?

Return information about the final target.

On the other hand, (c) is usually supported by a single platform-native call, such as GetFinalPathNameByHandle on Windows and realpath() function on Unix-like platforms , so from a performance perspective it may be preferable.

For Unix, it should be about using stat instead of lstat. I don't think we need realpath.
I don't know about Windows.

@iSazonov
Copy link
Contributor

This seems less flexible and more confusing than #24271. Especially disturbing is that we set the behavior "before", but then we don't even know about it (i.e. if then there are two code paths, we can't choose the right one). In other words it's not so easy to implement the "find all symbolic links" scenario.

@tmds
Copy link
Member Author

tmds commented May 19, 2021

@iSazonov this is about working with files and directories regardless whether the path is or is not using a symbolic link.
#24271 is about working with symbolic links.
These are different use-cases.

@mklement0

This comment has been minimized.

@tmds
Copy link
Member Author

tmds commented May 19, 2021

However, I think the proposed hybrid file-info approach is problematic, because it violates a fundamental assumption: that the properties of a given FileSystemInfo represent the file-system entry pointed to by its path.

How should these properties behave when the path does not exist, or when the link is broken?
Should these properties update on Refresh?

I think it is valid for these properties to reflect the path that was passed to the constructor and maintain the behavior that has always existed here. Changing that can also cause problems.

If the final target is desired, maybe add a property for that. For example: string RealPath { get; } (which can call realpath on Linux).

As proposed, one wouldn't be able to tell whether a given FileSystemInfo instance reflects the link's or the target's properties

They reflect the target independent of whether the path is a link or not.

@mklement0

This comment has been minimized.

@tmds
Copy link
Member Author

tmds commented May 19, 2021

Thus, setting followLink to true would simply return a FileSystemInfo instance representing the (ultimate, non-link) target, as if that target had been requested directly.

So the constructor throws if the path does not exist, or the link is broken?
And Refresh would be performed against the target path?

If I ask for link ./foo with followLink = true, and the resulting FileInfo instance then tells me that ./foo isn't a link, that is a problem.

I don't understand the problem. followLink is meant to make the properties reflect the target of the link, not the link itself.

you shouldn't have to guess at / investigate whether its properties are a 1:1 or a a hybrid representation.

Some properties are proposed to keep matching the path that was passed to the constructor. There is no guessing.

@mklement0

This comment has been minimized.

@iSazonov
Copy link
Contributor

@iSazonov this is about working with files and directories regardless whether the path is or is not using a symbolic link.
#24271 is about working with symbolic links.
These are different use-cases.

It looks like an insidious trick :-) when some properties and methods return information about the source object and others about the target object. This approach opens up unlimited ways for bugs in applications.
This is all the more useless because most APIs already work transparently with symbolic links.

(And please add more use-cases in OP - your current demo says nothing.)

@tmds
Copy link
Member Author

tmds commented May 20, 2021

If the target doesn't exist: That makes sense to me, and I thought it was also what you were proposing.

I'd take the path as is, just like it happens now. FileInfo constructor doesn't throw if the path does not exist.

This approach opens up unlimited ways for bugs in applications.

@iSazonov you say this causes bugs. @mklement0 you call it problematic. Can you demonstrate issues with some code?

I propose to keep invariant that has existed between the constructor argument and properties. Changing those definitely creates opportunity for bugs with existing code.

I don't see the value of making these properties reflect the target path. .Get[Link]Target(returnFinalTarget: true) can be used to get it.

This is all the more useless because most APIs already work transparently with symbolic links.

Except the ones that are part of this proposal.

And please add more use-cases

Other example:

const string filename = "/tmp/filename";
File.WriteAllText(filename, "content");

var fi = new FileInfo(filename, followsLink: true);
Console.WriteLine("File length is " + fi.Length);

// Make path a symbolic link to the same content.
const string target = "/tmp/target";
File.Move(filename, target);
File.CreateLink(target, filename); // use API from https://github.com/dotnet/runtime/issues/24271

fi.Refresh();
Console.WriteLine("File length is " + fi.Length);

@mklement0

This comment has been minimized.

@tmds
Copy link
Member Author

tmds commented May 20, 2021

Users may not be aware that the fileName/path they are providing is a symbolic link.

The goal of the proposal is the user doesn't need to be aware they are providing a symbolic link or not.

To me, followLink: true signals two expectations: (a) the input path is a link (which presupposes the path's existence) and (b) its target exists.

So these expectations are not true.

It is defensible to me that if the target path exists but isn't a link,

And non links are definitely meant to be used with this constructor.

A FileSystemInfo instance's FullName property uniquely identifies the file-system entry it represents.

With this goal in mind, I think it is desired for these properties to reflect the path that was passed to the constructor.

And definitely it's desired for Refresh to work against that path and not against the target path.
That is what FileSystemInfo does today: with Refresh an instance can change the file-system entry it represents. The invariant is the constructor path.

@tmds
Copy link
Member Author

tmds commented May 20, 2021

given that what I'm suggesting means that the resulting FileSystemInfo represent the target only.

This is what the GetSymbolicLinkTarget methods from #24271 (comment) do, right?

@mklement0

This comment has been minimized.

@iSazonov
Copy link
Contributor

@iSazonov you say this causes bugs. @mklement0 you call it problematic. Can you demonstrate issues with some code?

fileinfo.Delete() - nobody know what is an object deleted. If this object walks around a large/complex codebase, it's easy to fall into a bug.

The goal of the proposal is the user doesn't need to be aware they are providing a symbolic link or not.

It is interesting proposal but really many APIs already resolve symbolic links transparently and we have no need the implicit magic in the FileInfo API - what we want is to get more possibilities to work with symbolic links as end objects.

@tmds
Copy link
Member Author

tmds commented May 21, 2021

fileinfo.Delete() - nobody know what is an object deleted. If this object walks around a large/complex codebase, it's easy to fall into a bug.

I agree. I find it also ambiguous with the default constructor:

File.CreateLink("target", "link");
FileInfo fi = new FileInfo("link");
fi.Create(); // creates "target"
fi.Delete(); // deletes "link"

Maybe something explicit should be added like DeleteTarget().

@mklement0

This comment has been minimized.

@mklement0

This comment has been minimized.

@KevinCathcart
Copy link
Contributor

KevinCathcart commented May 28, 2021

I disagree with a target based Delete (and target based MoveTo). My understanding is that the intent here is to let applications see symlinks as basically just a normal file, and not need to worry about the fact they might be a link.

I'd argue the correct approach for an API like this is to make symbolic links act as much like hard-links as feasible (Note: for this analogy we are assuming all the interesting properties other than path live on the inode or whatever the most equivalent structure in the file system is, and thus are shared by all hard-links. We are also ignoring the fact that nearly all platforms/file systems disallow directory hard-links. [0])

Behaving just like normal files/directories is literally true of hard-links, because by definition they are just normal files/directories. There is no "this one is the link, that one is the target" distinction. They both/all are just links the the underlying inode, just like every file and directory is. (The exceptions are when a VFS layer steps in for things like mount points, or other reparse-point like constructs not known to the underlying file system.)

For hard-links, opening the file opens the "target" (the inode). Editing the dates or permissions edits the "target". Opening the hard-link (in either the access file contents, or traverse directory structure sense) opens the "target". All of those match the proposed behavior. However, moving or deleting hard-links affects exactly the path you supply, and does not affect some arbitrary other path on the filesystem.

This analogy cannot be completely perfect, as broken hard-links are not supposed to be a thing than can exist (they imply some form of file system corruption). But treating them broken symlinks symmetrically, with most of the properties, open method, etc acting like any other FileSystemInfo created referencing a non-existent file/directory feels reasonable.


Something mostly unrelated that is also worth pointing out is that to really work the way one would want, any FileSystemInfo objects returned from one that created with this flag would probably also want to be created with the same flag, as the user would have no way to specify the constructor parameter otherwise.

That is not too big a problem, but it does mean that if any library is using these types as exchange types, they could get different behavior if they are passed in a copy with this flag set or not, and it would only affect just the file passed in, but could also affect other files reached via the one passed in. I know the that the framework generally does not use these as exchange types, but I suspect there are user libraries who do use them. While not a library, powershell is one place that uses them heavily as an exchange type.

Footnote [0] As long as they don't form loops, they are actually relatively well behaved on most systems. Of course, they can really confuse tools, since they allow things like a file with only one hardlink to have multiple canonical paths, directories no longer have a single unambigious parent, etc. But despite these issues, they mostly work.

One caveat is deletion, due to the typical restriction of only deleting empty directories. An that allows them should allow deleting populated directories if there are other hardlinks (not any counting the special "." and ".." hardlinks seen in some Unix Filesystems, as the normal exception to the no hardlink rule). But many tools would in practice recursively delete the contents, before even trying to unlink the folder itself.

@KevinCathcart
Copy link
Contributor

Replying to #24271 (comment):

(With respecting to defaulting to hybrid version):

while definitely a breaking change, to me it falls into - to borrow PowerShell's classification - bucket 3, i.e. a change that is unlikely to impact existing code, because the current behavior is unhelpful.

The fact that one of the applications it would break is PowerShell itself makes me very skeptical that the team would find it acceptable. When a user runs Get-ChildItems to see the contents of a directory, they expect the values shown to represent the entries in the directory. That is the behavior of "dir" and "ls" in other shells, and both are aliased by default in PowerShell to Get-ChildItems. That means the attributes on a symlink must represent the symlink itself. And guess what datatype is returned there? Yep, FileInfo/DirectoryInfo. Having this expose a LinkInfo for the links might be acceptable (but I would be worried that it could break scripts.)

Unfortunately that would require a time machine. Existing PowerShell builds only enumerate Directories or Files, so could never get a LinkInfo. Because framework-dependent versions of PowerShell (like the dotnet global tools version) roll forward to new major releases if their preferred version is not installed, the existing code could end up running on the newer version, and would then be broken.

If PowerShell had used EnumerateFileSystemInfos and that API were to be updated to return LinkInfo objects by default it might not have been a problem. Except, of course, that there are plenty of other existing applications that would likely start breaking, because they assume they will only get back FileInfo or DirectoryInfo. And I'm betting you would not want to be returning LinkInfo objects by default.

But in any case, PowerShell is certainly not the only tool out there that wants to produce accurate directory listings, where attributes on a symlink show as the attributes of the link itself, so even if PowerShell had avoided it, others could be impacted.

@KevinCathcart
Copy link
Contributor

Another consideration (that is not necessarily a problem):

As mentioned in passing the first API review for #24271, on Windows fully resolving symlinks can cause network access, with both security and performance implications. This is because they can point at a UNC path. One of the major perf considerations is that the server in question could be one that that is basically a black hole. Resolving such a symlink would basically pause the application until a network timeout occurs. (Perhaps you remember how applications used to lock up, if you accidentally selected an empty or even non-existent floppy drive, while it checked for a disk? Or perhaps a similar, but less common, occurrence with optical drives? Now imagine a pause that could be even longer.) That means that APIs if like DirectoryInfo.EnumerateFiles resolved each file it returns it would need to lazily resolve symlinks in some cases, otherwise a directory full of UNC symlinks to a blackhole servers could be an extremely effective denial of service attack vector.

This is mostly fine, because these classes always lazily resolve the properties, and there is no reason not to do that in this scenario either. True, but it does mean that accessing the properties might throw, since the symlink in question could be part of a looping chain of symlinks, and thus not able to be fully resolved. I suppose the other option to to treat this case like the target not existing instead of throwing, which might also be reasonable.

The potential problem actually lies with the POSIX platforms, since there it is a requirement to fully resolve a symlink to even know if it is a file or a directory (which very much matters when doing this hybrid approach, since unfortunately typeof(DirectoryInfo) != typeof(FileInfo)). A similar network access scenario is possible in POSIX platforms, for nearly exactly the same reasons, with the caveat that unlike windows, such paths are not enabled by default. Paths like /smb/SERVERNAME/SHARENAME/path/to/file.ext have to be manually enabled on POSIX platforms with software like autofs.

@mklement0
Copy link

mklement0 commented May 28, 2021

I disagree with a target based Delete (and target based MoveTo)

So do I - .Delete() currently only acts on the link and that's how it should stay even with a hybrid representation.
Ditto for .MoveTo() , which, to my shock, currently acts on the target, which is arguably a bug [fortunately, this isn't true - I must have made a mistake].

As for not defaulting to hybrid representations: good points, I agree.

@mklement0
Copy link

@KevinCathcart: With requiring opt-in to the hybrid representation, my original concern returns: if you're handed a preexisting FileInfo / DirectoryInfo instance based on a link, you need to be able to tell if it represents the link itself or a hybrid view. So the question is: What is the best way to do enable that?

@mklement0
Copy link

mklement0 commented May 29, 2021

To answer my own question, returning to my original idea:

If we make new FileInfo(path, followLinks: true behave like new FileInfo.ResolveLinkTarget(returnFinalTarget: true) (ditto for DirectoryInfo), the problem of situationally different instances referring to the same path is bypassed.

However - assuming the path refers to a link whose target exists - you then need to be aware that you're dealing with a non-link representation whose .Delete() method deletes the original, so to speak, not a link to it.

Also, Name and FullPath will then by definition reflect the target, not the link input path (which I think isn't a problem, given the opt-in nature of the link-target resolution).

While an existing non-link path could just act as its own target, the question returns as to how to handle
new FileInfo(nonExistentPath, followLinks: true and new FileInfo(linkPathWithNonexistentTarget, followLinks: true

Unlike what I proposed earlier, it is probably better to retain the existing behavior of the parameter-less constructor, which quietly accepts a non-existent path.

This enables:

  • routinely passing followLinks: true (or better returnFinalTarget: true) to the FileSystemInfo constructors without fear of an exception, in line with the current behavior; it is the closest, of necessity opt-in thing to the desired hybrid representation, given the backward-compatibility constraints.

  • conversely, it potentially requires inspection of the instance returned to distinguish all possible existence modes, if you will; the following is expressed in terms of the new APIs approved in Proposed API for symbolic links #24271 and the current .Exists behavior:

non-link path link path
is non-link: info.LinkTarget is null is link: info.LinkTarget is not null
also exists: info.LinkTarget is null && ! info.Exists target exists: info.LinkTarget is not null && info.ResolveLinkTarget(returnFinalTarget: true)!.Exists

Note:

  • By existing target I mean: a given link's ultimate target that isn't another link (and is of the type implied by the FileSystemInfo subclass at hand: file or dir.)

@carlossanlop, as you can see, the above is both cumbersome and obscure:

  • Having to call info.LinkTarget is null to distinguish between a non-link and a link in the abstract is verbose and inefficient, given that the target path has to be retrieved, even though it isn't needed.

    • Checking for the ReparsePoint attribute is not sufficient, because not all reparse points on Windows are also links (e.g., a OneDrive reparse point isn't a link - it is "rehydrated" (downloaded on demand) in place, I presume).

    • The ReparseTag proposed in Expose reparse point tags #1908 also doesn't help, because you'd have to either test for all reparse tags that are also links (MountPoint, SymLink, ...) or - obscurely - test the value for the name-surrogate bit: info.ReparseTag & 0x20000000

  • Similarly, to make desired-state existence checking easier, we should consider introducing a FileSystemInfo.TargetExists property which for non-link items reports the path's existence itself, and for links reports the target's type-appropriate existence.

    • .Exists can then be used to test the input path's own existence only, irrespective of whether it refers to a non-link or link (and, in the latter case, irrespective of whether that link's target exists).
      • This is how it already works, except that on Unix the target must be checked too, in order to infer the link's type, which - unfortunately, but unavoidably - means that a broken Unix symlink always presents as a file.
    • .TargetExists can then be used to test if the path ultimately refers to a (non-link) item of the implied type (file vs. dir)
      • A separate property is also called for as a deliberate opt-in to this logic, given that looking for the ultimate target can be a potentially slow, blocking-for-an-unknown-duration target operation, as described by @KevinCathcart above.

@tmds
Copy link
Member Author

tmds commented May 31, 2021

I've changed back from target-based Delete to path-based Delete.

@Jozkee
Copy link
Member

Jozkee commented Jul 23, 2021

I wonder if we could instead add APIs for querying link's target file/directory information and return that in a minimal wrapper (FileSystemEntry) and keep the APIs focused in doing as less allocations as possible.
We could have the following APIs:

public static class File
{
    public static FileSystemEntry GetFinalInformation(ReadOnlySpan<char> path);
}

public static class Directory
{
    public static FileSystemEntry GetFinalInformation(ReadOnlySpan<char> path);
}

public class FileSystemInfo
{
    public FileSystemEntry FinalInformation => default;
}

Notes:

  • FileSystemEntry is ref struct and also is contained in the System.IO.Enumeraiton namespace, therefore it may not be a good idea to use it as the return type.
  • We could create a type very similar to FileSystemEntry that remains as minimal as possible:
public readonly struct MinimalFileSystemInfo
{
    // On Windows this can be obtained with GetFileInformationByHandleEx(lpFileInformation: FileBasicInfo).
    // On Unix this can be obtained with stat.
    public DateTimeOffset CreationTimeUtc;
    public DateTimeOffset LastAccessTimeUtc;
    public DateTimeOffset LastWriteTimeUtc;
    public FileAttributes Attributes;
}

cc @ericstj @carlossanlop

@ericstj
Copy link
Member

ericstj commented Jul 23, 2021

FileSystemEntry is ref struct and also is contained in the System.IO.Enumeraiton namespace, therefore it may not be a good idea to use it as the return type.

Would it make sense to consider this scenario as an extension of System.IO.Enumeraiton if that namespace is already dealing with this type of thing for non-symlinks?

@Jozkee
Copy link
Member

Jozkee commented Jul 23, 2021

if that namespace is already dealing with this type of thing for non-symlinks?

Do you mean if FileSystemEntry already returns the information of the link target? If so, it does not, it returns information for the link itself.

@ericstj
Copy link
Member

ericstj commented Jul 23, 2021

I mean why not add link target support to the enumeration APIs.

@Jozkee
Copy link
Member

Jozkee commented Jul 23, 2021

We can add support for enumeration scenarios as well, I was thinkking in an instance method that returns a new FileSystemEntry or returns the same instance if not a link:

public ref struct FileSystemEntry
{
    public FileSystemEntry GetFinalLinkTarget();
}

I don't know if we should care for the non-final target scenario.

@iSazonov
Copy link
Contributor

I think an enhancement of FileSystemEntry looks right direction as implementation details for both enumeration and explicit (sym)link manipulations. See my old comment #52666 (comment).
To resolve conflicts between properties (like LastAccessTimeUtc) of link itself and link target it would great to have public class SymbolicLink.

@tmds
Copy link
Member Author

tmds commented Aug 17, 2021

I wonder if we could instead add APIs for querying link's target file/directory information and return that in a minimal wrapper (FileSystemEntry) and keep the APIs focused in doing as less allocations as possible.

My focus is to eliminate the system calls that hop links in ResolveLinkTarget and perform a single stat call.
For FileInfo I don't think there are many allocations besides the FileInfo instance itself.

FileInfo being a class makes it usable in more cases than a ref struct.
Using FileInfo also makes it easy to change existing code where the intent was to get info for the target rather than the link.

System.IO.Enumeration may also need improving/extending to better support links. So maybe both are warranted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

7 participants