Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FileStream buffering only when it makes sense #51048

Open
Tracked by #64596
adamsitnik opened this issue Apr 10, 2021 · 5 comments
Open
Tracked by #64596

Use FileStream buffering only when it makes sense #51048

adamsitnik opened this issue Apr 10, 2021 · 5 comments
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented Apr 10, 2021

Whenever buffering is enabled, we need to allocate BufferedFileStreamStrategy

internal static FileStreamStrategy EnableBufferingIfNeeded(WindowsFileStreamStrategy strategy, int bufferSize)
=> bufferSize == 1 ? strategy : new BufferedFileStreamStrategy(strategy, bufferSize);

which has a finalizer (that is an additional work for the GC):

and aquires lock for every ReadAsync and WriteAsync:

SemaphoreSlim semaphore = EnsureAsyncActiveSemaphoreInitialized();

SemaphoreSlim semaphore = EnsureAsyncActiveSemaphoreInitialized();

this is of course done for some good reasons, but the thing is that we don't always need the buffering.

A good example is File.WriteAllTextAsync where we create a FileStream with buffering enabled:

FileStream stream = new FileStream(
path, append ? FileMode.Append : FileMode.Create, FileAccess.Write, FileShare.Read, DefaultBufferSize,
FileOptions.Asynchronous | FileOptions.SequentialScan);

but later use buffers that are at least as big as the internal FileStream buffer (so we never take advantage of buffering)

buffer = ArrayPool<char>.Shared.Rent(DefaultBufferSize);

to write to the file:

await sw.WriteAsync(new ReadOnlyMemory<char>(buffer, 0, batchSize), cancellationToken).ConfigureAwait(false);

The person who is willing to work on this task should:

  1. Search the entire runtime repo for places where buffering is enabled (bufferSize of the FileStream ctor is not set to 1 in an explicit way), but it hurts the performance like in the example above. If possible, please verify that FileStream used for async IO is created with FileOptions.Asynchronous or isAsync: true flag.
  2. Contribute benchmarks to the dotnet/performance repo after reading the Microbenchmark Design Guidelines doc. The benchmarks need to cover the code that is about to be changed.
  3. Run the benchmarks following the Benchmarking workflow for dotnet/runtime repository doc and ensure the changes are improving perf
  4. Contribute the improvements to dotnet/runtime and provide benchmark results
@adamsitnik adamsitnik added area-System.IO tenet-performance Performance related issue help wanted [up-for-grabs] Good issue for external contributors labels Apr 10, 2021
@adamsitnik adamsitnik added this to the 6.0.0 milestone Apr 10, 2021
@adamsitnik adamsitnik added this to To do in System.IO - FileStream via automation Apr 10, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 10, 2021
@ghost
Copy link

ghost commented Apr 10, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

Whenever buffering is enabled, we need to allocate BufferedFileStreamStrategy

internal static FileStreamStrategy EnableBufferingIfNeeded(WindowsFileStreamStrategy strategy, int bufferSize)
=> bufferSize == 1 ? strategy : new BufferedFileStreamStrategy(strategy, bufferSize);

which has a finalizer (that is an additional work for the GC):

and aquires lock for every ReadAsync and WriteAsync:

SemaphoreSlim semaphore = EnsureAsyncActiveSemaphoreInitialized();

SemaphoreSlim semaphore = EnsureAsyncActiveSemaphoreInitialized();

this is of course done for some good reasons, but the thing is that we don't always need the buffering.

A good example is File.WriteAllTextAsync where we create a FileStream with buffering enabled:

FileStream stream = new FileStream(
path, append ? FileMode.Append : FileMode.Create, FileAccess.Write, FileShare.Read, DefaultBufferSize,
FileOptions.Asynchronous | FileOptions.SequentialScan);

but later use buffers that are at least as big as the internal FileStream buffer (so we never take advantage of buffering)

buffer = ArrayPool<char>.Shared.Rent(DefaultBufferSize);

to write to the file:

await sw.WriteAsync(new ReadOnlyMemory<char>(buffer, 0, batchSize), cancellationToken).ConfigureAwait(false);

The person who is willing to work on this task should:

  1. Search the entire runtime repo for places where buffering is enabled (bufferSize of the FileStream ctor is not set to 1 in an explicit way), but it hurts the performance like in the example above.
  2. Contribute benchmarks to the dotnet/performance repo after reading the Microbenchmark Design Guidelines doc. The benchmarks need to cover the code that is about to be changed.
  3. Run the benchmarks following the Benchmarking workflow for dotnet/runtime repository doc and ensure the changes are improving perf
  4. Contribute the improvements to dotnet/runtime and provide benchmark results
Author: adamsitnik
Assignees: -
Labels:

area-System.IO, tenet-performance, up-for-grabs

Milestone: 6.0.0

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Apr 10, 2021
@huoyaoyuan
Copy link
Member

  1. If possible, please verify that FileStream used for async IO is created with FileOptions.Asynchronous or isAsync: true flag.

Related to #45946 (comment) . Currently using isAsync forces the user to provide a buffer size, which doesn't have a help to determine. It's very difficult to learn that 1 is optimal now.

@adamsitnik
Copy link
Member Author

It's very difficult to learn that 1 is optimal now.

I acknowledge the fact that setting bufferSize to 1 is not intuitive and a trick known to very few engineers.

Part of our .NET 6 FileStream work is to publish a doc about performance best practices. We are going to explain what buffering is, how it's different from the buffering done by the OS and how to disable it.

@sakno
Copy link
Contributor

sakno commented May 7, 2021

According to the current source code in main branch, this statement

I acknowledge the fact that setting bufferSize to 1 is not intuitive and a trick known to very few engineers.

is incorrect for Unix-like platforms. ChooseStrategy method always creates Net5CompatFileStreamStrategy without analysis of bufferSize argument as for Windows.

@stephentoub
Copy link
Member

stephentoub commented May 23, 2021

is incorrect for Unix-like platforms

It's not incorrect. The Net5CompatFileStreamStrategy lazily creates the buffer only if it's needed. With a _bufferSize of 1, any non-empty user buffer will result in skipping the buffer and it won't be lazily created. This is the case for both Windows and Unix in .NET 5, and has been for a long time.

@adamsitnik adamsitnik modified the milestones: 6.0.0, 7.0.0 Jul 22, 2021
@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
Development

No branches or pull requests

5 participants