Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve File.Move performance #65092

Open
adamsitnik opened this issue Feb 9, 2022 · 9 comments
Open

Improve File.Move performance #65092

adamsitnik opened this issue Feb 9, 2022 · 9 comments
Assignees
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

Similarly to what I did in #63675 for Directory.Move there is most likely no need to perform a check whether source file exists in

if (!FileSystem.FileExists(fullSourceFileName))

The native sys-call that is called by FileSystem.MoveFile is most likely going to fail in such situations. We should just handle the FileSystem.MoveFile failure and map it to the same (existing) exception on both Windows and Unix.

The following condition can most likely avoid the check as well:

if (Interop.Sys.LStat(sourceFullPath, out sourceStat) == 0 && // source file exists
(Interop.Sys.LStat(destFullPath, out destStat) != 0 || // dest file does not exist
(sourceStat.Dev == destStat.Dev && // source and dest are on the same device
sourceStat.Ino == destStat.Ino)) && // source and dest are the same file on that device
Interop.Sys.Rename(sourceFullPath, destFullPath) == 0) // try the rename

(we could check if the source file is the same file/drive as destination only after we verify that destination exists. In cases where destination does not exists there is no need of doing that)

While working on this it would be nice to unify File.Move and FileInfo.MoveTo and fix #35122

The contributor who is willing to work on this issue needs to contribute File.Move benchmarks to the https://github.com/dotnet/performance repository. The File benchmarks can be found here. Here you can find docs that explain how to run the benchmarks against local build of dotnet/runtime.

@adamsitnik adamsitnik added area-System.IO tenet-performance Performance related issue help wanted [up-for-grabs] Good issue for external contributors labels Feb 9, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 9, 2022
@ghost
Copy link

ghost commented Feb 9, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Similarly to what I did in #63675 for Directory.Move there is most likely no need to perform a check whether source file exists in

if (!FileSystem.FileExists(fullSourceFileName))

The native sys-call that is called by FileSystem.MoveFile is most likely going to fail in such situations. We should just handle the FileSystem.MoveFile failure and map it to the same (existing) exception on both Windows and Unix.

The following condition can most likely avoid the check as well:

if (Interop.Sys.LStat(sourceFullPath, out sourceStat) == 0 && // source file exists
(Interop.Sys.LStat(destFullPath, out destStat) != 0 || // dest file does not exist
(sourceStat.Dev == destStat.Dev && // source and dest are on the same device
sourceStat.Ino == destStat.Ino)) && // source and dest are the same file on that device
Interop.Sys.Rename(sourceFullPath, destFullPath) == 0) // try the rename

(we could check if the source file is the same file/drive as destination only after we verify that destination exists. In cases where destination does not exists there is no need of doing that)

While working on this it would be nice to unify File.Move and FileInfo.MoveTo and fix #35122

The contributor who is willing to work on this issue needs to contribute File.Move benchmarks to the https://github.com/dotnet/performance repository. The File benchmarks can be found here. Here you can find docs that explain how to run the benchmarks against local build of dotnet/runtime.

Author: adamsitnik
Assignees: -
Labels:

area-System.IO, tenet-performance, up-for-grabs

Milestone: -

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Feb 9, 2022
@adamsitnik adamsitnik added this to the Future milestone Feb 9, 2022
@deeprobin
Copy link
Contributor

As described in #58240 you can assign the issue to me. But I think I'll just start working on that issue in about 1-3 weeks.

@adamsitnik
Copy link
Member Author

But I think I'll just start working on that issue in about 1-3 weeks.

@deeprobin Please ping me then and I'll assign you. Thanks!

@rhuijben
Copy link

rhuijben commented May 8, 2022

@deeprobin ping

@deeprobin
Copy link
Contributor

@rhuijben Thank you.
@adamsitnik happy to assign me

@adamsitnik
Copy link
Member Author

happy to assign me

done!

@deeprobin
Copy link
Contributor

I think the PR will come at the end of this week or next week.

@deeprobin
Copy link
Contributor

@adamsitnik I know it's not perfect yet, but I noticed test failures on Windows right after I removed the check.
How should I proceed?
For reference: https://github.com/deeprobin/runtime/tree/issue-65092

      System.IO.Tests.File_Move.NonExistentPath [FAIL]
        Assert.Throws() Failure
        Expected: typeof(System.IO.FileNotFoundException)
        Actual:   typeof(System.IO.DirectoryNotFoundException): Could not find a part of the path.
      System.IO.Tests.File_Move.WindowsWildCharacterPath_Core [FAIL]
        Assert.Throws() Failure
        Expected: typeof(System.IO.FileNotFoundException)
        Actual:   typeof(System.IO.IOException): Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbe
  zeichnung ist falsch.

Benchmark Results

.NET 7.0 Preview 4

Method Mean Error StdDev Median Min Max Allocated
Move 480.7 us 34.75 us 38.62 us 477.4 us 409.9 us 555.1 us 480 B

deeprobin/runtime#issue-65092

Method Mean Error StdDev Median Min Max Allocated
Move 582.8 us 47.63 us 54.85 us 581.5 us 490.2 us 718.2 us 480 B

(Why is there a regression - I only removed the exists-check)

@rhuijben
Copy link

The issue is that you get a different error than before, which is a regression.

I would like to see an attempt to check for this case after the error, so callers see the expected error when there is an error, but the code for the non-error case is faster than before, by avoiding the performance penalty of another sys all.

Personally I wouldn't care too much about one specific error or another one, as that difference is quite common in platform agnostic code... But in this case backwards compatibility matters the most.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

3 participants