Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Unix FileStatus flag retrieval without perf regression #47348

Closed
wants to merge 31 commits into from

Conversation

carlossanlop
Copy link
Member

@carlossanlop carlossanlop commented Jan 22, 2021

Fixes #37301
Addresses #41739

If you are reviewing this PR, please take a look at each commit individually (small incremental changes).

FileSystemEntry and FileSystemInfo were not reporting their hidden flags correctly on Linux and MacOS.

On Linux and MacOS, the hidden attribute can be set by prefixing the file or folder name with a dot.

On MacOS and Windows, the hidden attribute can be set by prefixing the Hidden flag on the file or folder.

The original PR introduced a perf regression due to a new stat call.

The FileStatus and FileSystemEntry initialization code had duplicate code when setting flags. The right way to fix it is to move all the stat and lstat calling inside FileStatus.

FileSystemEntry also has some special needs when detecting directories and symbolic links: since it has access to the InodeType, some info can be inferred from them, saving stat calls. So that logic had to be preserved in the FileSystemEntry initialization method, while deferring as much as possible to FileStatus.

Across FileStatus, I ensured the stat and lstat calls only get executed lazily (don't call again if already initialized), unless explicitly asking to refresh them (forcing the new stat call).

I also added the original unit tests to verify both the hidden flag and the readonly flag (we didn't have unit tests for the last one).

I made sure all unit tests are passing in MacOS I also ran a temporary unit test with the below code to quickly verify that the performance regression is gone:

Temporary perf test
[Fact]
public void Test()
{
  var sw = new Diagnostics.Stopwatch();
  // Inside the `experiments` folder, I created one symbolic link that points to a folder,
  // and one symbolic link that points to a file
  // This ensures the `IsSymbolicLink` method I added is called.
  string path = "/Users/calope/Public/repos/experiments";
  string pattern = "*";

  Console.WriteLine("Starting...");
  sw.Start();
  for (int i = 0; i < 7000; i++)
  {
     // This is the exact same code we have in a FileSystem benchmark:
     // https://github.com/dotnet/performance/blob/a25a963e3f14214102207a2b75a5ece25162a333/src/benchmarks/micro/libraries/System.IO.FileSystem/Perf.Directory.cs#L117
      Directory.EnumerateFiles(path, pattern, SearchOption.AllDirectories).Count();
  }
  sw.Stop();
  Console.WriteLine($"Elapsed: {sw.ElapsedMilliseconds}");
}

cc @iSazonov

Pending: I want to get official dotnet/performance results for all the IO.FIleSystem performance tests. But I need to figure out how to run the micro benchmarks in .NET 6.0 in MacOS, because the Preventing regressions instructions are not working anymore. I get this error:

Error message
calope@Carloss-MacBook-Pro performance % cd src/benchmarks/micro 
calope@Carloss-MacBook-Pro micro % dotnet run -c release -f netcoreapp5.0 --artifacts "~/Public/perf_after" --corerun "~/Public/repos/runtime/artifacts/bin/testhost/net6.0-OSX-Release-x64/shared/Microsoft.NETCore.App/6.0.0/corerun" --filter System.IO.Tests.Perf_Directory.EnumerateFiles
/usr/local/share/dotnet/sdk/5.0.102/Sdks/Microsoft.NET.Sdk/targets/Microsoft.NET.TargetFrameworkInference.targets(141,5): 
error NETSDK1045: The current .NET SDK does not support targeting .NET Core 6.0.  
Either target .NET Core 5.0 or lower, or use a version of the .NET SDK that supports .NET Core 6.0. 
[/Users/calope/Public/repos/performance/src/benchmarks/micro/MicroBenchmarks.csproj]
The build failed. Fix the build errors and run again.

Edits:

  • I had originally reported a perf improvement with my original changes. Unfortunately, they also had a bug uncovered by the CI, which I'm investigating.
  • CI issues fixed. Duration of the test before and after my changes:
    • Without changes: 44894
    • With changes: 46904

…' cases, but not failure. Use _secondaryCache and _initializedSecondaryCache.
…yExistIndependentlyOfTarget unit test (link should exist after main file deleted).
    Need to preserve the _isDirectory field. The order and the place in which it acquires a value matter. It should also be set to false when EnsureStatInitialized is called. The unit test that verifies this is System.IO.Tests.DirectoryInfo_Exists.SymLinksMayExistIndependentlyOfTarget.
….ToFileSystemInfo, since it's the only place where it is used. Move the Debug.Assert to the top with the ROS<char> instead of the string.
@carlossanlop carlossanlop added this to the 6.0.0 milestone Jan 22, 2021
@carlossanlop carlossanlop added this to In progress in Better for Linux via automation Jan 22, 2021
@carlossanlop carlossanlop added this to In progress in System.IO - File system via automation Jan 22, 2021
@carlossanlop carlossanlop self-assigned this Jan 22, 2021
@stephentoub
Copy link
Member

If you are reviewing this PR, please take a look at each commit individually (small incremental changes).

Which of the 28 commits here fixes the actual issue? Lots of these looks like style.

@carlossanlop
Copy link
Member Author

Which of the 28 commits here fixes the actual issue? Lots of these looks like style.

Only a few were style. This is the commit that fixes the hidden bug: 62b77b8

@carlossanlop
Copy link
Member Author

carlossanlop commented Jan 22, 2021

The CI is failing in OSX in tests related to the hidden flag. I'm investigating.

Strangely, they are passing in my MacOS.

Edit: I'm making it a draft while I address the failures.

@carlossanlop carlossanlop marked this pull request as draft January 22, 2021 20:51
…eSystemEntry.Initialize calls to refresh individual caches.
… hidden check in Initialize to prevent perf regression.
@carlossanlop carlossanlop marked this pull request as ready for review January 22, 2021 23:49
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right way to fix it is to move all the stat and lstat calling inside FileStatus

This is a very good idea. 👍

The changes look good, I would just improve the naming a little bit and encapsulate the caching logic within the FileStatus type.

As soon as you fix the CI errors I am going to perform a more detailed review.

I get this error

As mentioned offline, you get an error about missing .NET 6 SDK. Perf repo always requires latest SDK, so we can have benchmarks that test new APIs (https://github.com/dotnet/performance/blob/master/docs/prerequisites.md#net-core-sdk)

Comment on lines +55 to +59
if (useHiddenFlag)
{
fileTwo.Attributes |= FileAttributes.Hidden;
fileFour.Attributes |= FileAttributes.Hidden;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my education: is it possible to set these attributes before creating the file and have FileInfo.Create create the file with the attributes provided?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. The file needs to exist. The Attributes setter calls the FileStatus.SetAttributes method (you can see it in this PR), which sets each flag into the file via a P/Invoke.


[Fact]
[PlatformSpecific(TestPlatforms.AnyUnix)]
public void SkippingHiddenFiles_Unix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think that the test name should include the information about using prefixes.

Suggested change
public void SkippingHiddenFiles_Unix()
public void SkippingFilesHiddenWithDotPrefixOnUnix()

@@ -21,25 +21,42 @@ protected virtual string[] GetPaths(string directory, EnumerationOptions options
}

[Fact]
public void SkippingHiddenFiles()
[PlatformSpecific(TestPlatforms.Windows | TestPlatforms.OSX)]
public void SkippingHiddenFiles_Windows_OSX()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think that the test name should include the information about using attributes.

Suggested change
public void SkippingHiddenFiles_Windows_OSX()
public void SkippingFilesHiddenWithAttributesOnWindowsAndOSX()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when the user sets the attribute on Linux? Is it just ignored and the user gets silent error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error. It just gets ignored. I just tested it on Ubuntu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No error. It just gets ignored.

I wonder if we should address that somehow. Usually, we would just mark the enum member with UnsupportedOSPlatform

public enum FileAttributes
{
   [UnsupportedOSPlatform("Linux")]
   Hidden
}

But if I understand correctly, the enum can be returned by File.GetAttributes for files that start with a dot on Linux, but has no effect when used with the corresponding setter. Should we at least mention it in the docs?

IEnumerable<string> enumerable = new FileSystemEnumerable<string>(
testDirectory.FullName,
(ref FileSystemEntry entry) => entry.ToFullPath(),
new EnumerationOptions() { AttributesToSkip = 0 })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does setting AttributesToSkip to 0 here has any effect on the test behavior? Would removing it change anything?

Suggested change
new EnumerationOptions() { AttributesToSkip = 0 })
new EnumerationOptions())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the IsHiddenAttribute method starts failing if I don't explicitly set this to 0 (to signify "don't skip any attributes").

The reason is because when unset, then by default we skip the Hidden and the System attributes. That's how EnumerationOptions.AttributesToSkip is documented.

Skip entries with the given attributes. Default is FileAttributes.Hidden | FileAttributes.System.

Comment on lines +39 to +40
entry._status = default;
entry._status.Invalidate();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to encapsulate the logic that is modifying the FileStatus below and move it into FileStatus itself? So the FileStatus could keep all the caching logic as it's private detail and return only a valid FileStatus?

Suggested change
entry._status = default;
entry._status.Invalidate();
entry._status = FileStatus.Create(entry, directoryEntry, out FileAttributes attributes);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

else if (_fileStatus.Gid == Interop.Sys.GetEGid())
}

internal bool HasSecondaryDirectoryFlag => HasDirectoryFlag(_secondaryCache);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasSecondaryDirectoryFlag does not tell me much. Is it checking if the symlink is a directory?

Suggested change
internal bool HasSecondaryDirectoryFlag => HasDirectoryFlag(_secondaryCache);
internal bool IsSymlinkToDirectory => HasDirectoryFlag(_secondaryCache);

Copy link
Member Author

@carlossanlop carlossanlop Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is checking if the symbolic link's target is a directory.
Edit: I renamed it appropriately, based on the other related feedback.


private bool IsValid =>
_initializedMainCache == 0 && // Should always be successfully refreshed
(_initializedSecondaryCache == -1 || _initializedSecondaryCache == 0); // Only refreshed when path is detected to be a symbolic link
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(_initializedSecondaryCache == -1 || _initializedSecondaryCache == 0); // Only refreshed when path is detected to be a symbolic link
_initializedSecondaryCache <= 0; // Only refreshed when path is detected to be a symbolic link

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means successful retrieval. -1 means uninitialized. Any other value means it's a linux error number. Although I don't think error numbers can be negative, I wanted to be very specific about the two possible values I cared about.
I assume it has a slight performance improvement to do one check instead of two?

Copy link
Member

@adamsitnik adamsitnik Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it has a slight performance improvement to do one check instead of two?

If it has any, it's most probably very, very small. I was more thinking about simplifying the code. As long as everything is encapsulated within this type, we should rather not worry about values that are negative, but not equal -1 (as we control everything in this type and it's impossible). But it's a minor nit.


internal bool IsSecondaryCacheValid => _initializedSecondaryCache == 0;

private bool IsValid =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: When I see FileStatus.IsValid I think of the file being valid, not file status cache. Would HasValidCache be a better name?

Suggested change
private bool IsValid =>
private bool HasValidCache =>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'll use HasValidCaches (plural).

}

internal void SetLastWriteTime(string path, DateTimeOffset time) => SetAccessOrWriteTime(path, time, isAccessTime: false);
internal void RefreshCaches(ReadOnlySpan<char> path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned before, it would be great if we could move all the methods that need to call this particular method inside FileStatus and make it private, so the cache would become an implementation detail to the consumers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference between called EnsureCachesInitialized and RefreshCaches: the former will check if they are initialized, and if not, it will refresh them. The second one will always refresh them.
Maybe I can pass a boolean to indicate if we want to force refresh or not. That way, only one method is visible outside.


public FileSystemInfo ToFileSystemInfo()
{
Debug.Assert(!PathInternal.IsPartiallyQualified(FullPath), "FullPath should be fully qualified when constructed from directory enumeration");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert in public API? If it is important check it should be a real time check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the other comment below, I decided to revert this change since it was unrelated to the purpose of this PR.

Comment on lines +171 to +175
FileSystemInfo info = IsDirectory
? new DirectoryInfo(fullPath, fileName: fileName, isNormalized: true)
: new FileInfo(fullPath, fileName: fileName, isNormalized: true);

info.Init(ref _status);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create internal ctor-s with the parameter?

Copy link
Member Author

@carlossanlop carlossanlop Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can revert this change. Originally it was not a constructor but a factory method. I thought we could save one method call if I put all that logic in the only place where it was called. But it's unrelated to this change anyway.

Comment on lines +288 to +289
}
private void ThrowOnCacheInitializationError(ReadOnlySpan<char> path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
private void ThrowOnCacheInitializationError(ReadOnlySpan<char> path)
}
private void ThrowOnCacheInitializationError(ReadOnlySpan<char> path)


// We track intent of creation to know whether or not we want to (1) create a
// DirectoryInfo around this status struct or (2) actually are part of a DirectoryInfo.
internal bool InitiallyDirectory { get; set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is seem assigned only in ctor.

Suggested change
internal bool InitiallyDirectory { get; set; }
internal bool InitiallyDirectory { get; private set; }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is seem assigned only in ctor.

So it should be { get; } instead of { get; private set; } ?

Comment on lines 110 to +112
// IMPORTANT: Attribute logic must match the logic in FileSystemEntry

EnsureStatInitialized(path);
EnsureCachesInitialized(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the comment? Is all logic in one place now?

Interop.CheckIo(Interop.Sys.LChflags(path, (_fileStatus.UserFlags & ~(uint)Interop.Sys.UserFlags.UF_HIDDEN)), path, InitiallyDirectory);
}
}
return DateTimeOffset.FromFileTime(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return const?

Comment on lines +230 to +239
if (!IsMainCacheValid)
{
if (!TryRefreshMainCache(path))
{
if (!continueOnError)
{
ThrowOnCacheInitializationError(path);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use one if? Or move all the logic in TryRefreshMainCache()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it all to one if. I don't think we should throw from within a try method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean IsMainCacheValid check could be in TryRefreshMainCache().

private unsafe void SetAccessOrWriteTime(string path, DateTimeOffset time, bool isAccessTime)
{
// force a refresh so that we have an up-to-date times for values not being overwritten
_fileStatusInitialized = -1;
EnsureStatInitialized(path);
Invalidate();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalidate that? Can we use more informative name?

@carlossanlop
Copy link
Member Author

Had a conversation with @adamsitnik about this. I am going to look into splitting the Unix code into Linux and OSX, because it may be possible to use an API (at least on Linux) that lets us retrieve the list of files inside a folder with the hidden flag specified. This will allow us to have separate performance results for those platforms. I will do the same investigation for the Windows API.

@carlossanlop carlossanlop marked this pull request as draft January 29, 2021 17:42
@iSazonov
Copy link
Contributor

This could fix #31301 too.

@iSazonov
Copy link
Contributor

Currently I have worked on improving PowerShell to handle OneDrive. It is a reparse points. And we need to analyze a surrogates. This information is not present in FileSystemInfo.FileAttributes - there is only one flag RepasePoint but we need more and we have to call FindFirstFileEx and check dwReserved0.

Thus, we need extended attributes not only on Unix, but also on Windows. Perhaps we could introduce new FileExtendedAttributes property.
This would not only allow us advanced scenarios but also help solve the performance problem.

Base automatically changed from master to main March 1, 2021 09:07
@ghost ghost closed this Mar 31, 2021
Better for Linux automation moved this from In progress to Done Mar 31, 2021
System.IO - File system automation moved this from In progress to Done Mar 31, 2021
@ghost
Copy link

ghost commented Mar 31, 2021

Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 30, 2021
@carlossanlop carlossanlop deleted the IsHidden2 branch February 7, 2024 06:42
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

Successfully merging this pull request may close these issues.

FileSystemEntry.Attributes property is not correct on Unix
5 participants