Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileStream Flush(true) significant performance degradation in .NET 8 when buffer size is bigger than the payload #100229

Closed
AddictedCS opened this issue Mar 25, 2024 · 8 comments
Labels
area-System.IO tenet-performance Performance related issue

Comments

@AddictedCS
Copy link

Description

The original issue was identified in apache/lucenenet#933; nevertheless, after deeper analysis, the root cause was narrowed down to the FileStream class, pointing towards a framework issue.

Below is the source code that reproduces the issue.

namespace Emy.Benchmark;

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;

[SimpleJob(RuntimeMoniker.Net70, baseline: true)]
[SimpleJob(RuntimeMoniker.Net80)]
[GcServer(true)]
public class FileStreamBenchmarks
{
    private byte[] buffer = null!;

    [GlobalSetup]
    public void Setup()
    {
        if (Directory.Exists("test_dir"))
            Directory.Delete("test_dir", true);
        Directory.CreateDirectory("test_dir");
        
        buffer = PayloadSize.RandomBytes();
    }
    
    [Params(512, 1024, 4096)]
    public int PayloadSize { get; set; }
    
    [Params(1024, 4096, 8192)]
    public int BufferSize { get; set; }

    [Benchmark] 
    public int WriteStream() =>  WriteData(buffer, BufferSize);
    
    private static int WriteData(byte[] buffer, int bufferSize)
    {
        using var stream = 
            new FileStream(
                path: Path.Combine("test_dir", $"test.{buffer.Length}.bin"),
                mode: FileMode.OpenOrCreate,
                access: FileAccess.Write,
                share: FileShare.ReadWrite,
                bufferSize: bufferSize);
        stream.Write(buffer, 0, buffer.Length);
        stream.Flush(flushToDisk: true);
        return buffer.Length;
    }
}

public static class Bytes
{
    public static byte[] RandomBytes(this int self)
    {
        byte[] buffer = new byte[self];
        Random.Shared.NextBytes(buffer);
        return buffer;
    }
}

Project configuration

<Project Sdk="Microsoft.NET.Sdk">

    <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFrameworks>net7.0;net8.0;net9.0</TargetFrameworks>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>enable</Nullable>
    </PropertyGroup>

    <ItemGroup>
        <PackageReference Include="BenchmarkDotNet" Version="0.13.12" />
        <PackageReference Include="BenchmarkDotNet.Diagnostics.dotTrace" Version="0.13.12" />
    </ItemGroup>

</Project>

Configuration

  • Which version of .NET is the code running on?
    .NET 8, or .NET 9
  • What OS version, and what distro if applicable?
    macOS
  • What is the architecture (x64, x86, ARM, ARM64)?
    ARM64

Regression?

The issue is reproducible in .NET 8 and .NET 9.

Data

Notice when the buffer size is bigger than the payload, performance degrades ~150x.

BenchmarkDotNet v0.13.12, macOS Sonoma 14.4 (23E214) [Darwin 23.4.0]
Apple M2 Max, 1 CPU, 12 logical and 12 physical cores
.NET SDK 9.0.100-preview.2.24157.14
  [Host]   : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD
  .NET 7.0 : .NET 7.0.5 (7.0.523.17405), Arm64 RyuJIT AdvSIMD
  .NET 8.0 : .NET 8.0.3 (8.0.324.11423), Arm64 RyuJIT AdvSIMD

Server=True  

| Method      | Job      | Runtime  | PayloadSize | BufferSize | Mean        | Error     | StdDev     | Ratio  | RatioSD |
|------------ |--------- |--------- |------------ |----------- |------------:|----------:|-----------:|-------:|--------:|
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 1024       |    28.90 μs |  0.330 μs |   0.292 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 1024       | 4,416.57 μs | 87.109 μs | 196.619 μs | 154.81 |    5.78 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 4096       |    27.75 μs |  0.550 μs |   1.184 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 4096       | 4,298.04 μs | 85.603 μs | 149.926 μs | 155.93 |   10.91 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 8192       |    28.50 μs |  0.352 μs |   0.329 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 8192       | 4,423.72 μs | 87.476 μs | 225.804 μs | 150.71 |    5.80 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 1024       | 4,289.43 μs | 85.522 μs | 142.889 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 1024       | 4,466.22 μs | 87.752 μs | 173.214 μs |   1.05 |    0.05 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 4096       |    28.15 μs |  0.550 μs |   0.949 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 4096       | 4,416.73 μs | 84.027 μs | 210.808 μs | 159.07 |    9.01 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 8192       |    29.14 μs |  0.257 μs |   0.241 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 8192       | 4,597.65 μs | 86.784 μs | 209.593 μs | 158.45 |    9.97 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 4096        | 1024       | 4,353.85 μs | 85.278 μs | 166.328 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 4096        | 1024       | 4,204.54 μs | 75.218 μs | 110.254 μs |   0.97 |    0.05 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 4096        | 4096       | 4,257.61 μs | 83.921 μs | 133.108 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 4096        | 4096       | 4,431.81 μs | 88.581 μs | 198.124 μs |   1.04 |    0.05 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 7.0 | .NET 7.0 | 4096        | 8192       |    28.78 μs |  0.148 μs |   0.138 μs |   1.00 |    0.00 |
| WriteStream | .NET 8.0 | .NET 8.0 | 4096        | 8192       | 4,187.94 μs | 77.722 μs |  68.898 μs | 145.63 |    2.46 |

Also, if we disable the buffering altogether, bufferSize: 0, then the performance between .NET 7 and .NET 8 is on par.

Analysis

After enabling tracing, it seems the problem is in the fsync() call; at least, this is the difference between buffered, non-buffered and buffer size less than the payload stack trace.

Buffered (buffer size larger than the payload) stack trace (slow performance):
Screenshot 2024-03-24 at 6 20 30 PM

Non-buffered stack trace (fast performance):
Screenshot 2024-03-24 at 7 04 19 PM

Buffered (buffer size less than payload) (fast performance):
Screenshot 2024-03-24 at 8 53 02 PM

In summary, the DotNet Framework seems to call the fsync() method when the buffer is bigger than the payload, which leads to severe performance degradation.

@paulirwin, @eladmarg, @jeme

@AddictedCS AddictedCS added the tenet-performance Performance related issue label Mar 25, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 25, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 25, 2024
@huoyaoyuan huoyaoyuan added area-System.IO and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Mar 25, 2024
@jeme
Copy link

jeme commented Mar 25, 2024

What I find interesting is that its actually .NET 7 and when the data is less that the buffer capacity that is the odd one out. As soon as .NET 7 hits the buffer capacity, there is a huge jump in the time it takes to flush the data and it goes on par with .NET 8, .NET seems to stay more stable.

| Method      | Job      | Runtime  | PayloadSize | BufferSize | Mean        | Error     | StdDev     | Ratio  | RatioSD |
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 1024       |    28.90 μs |  0.330 μs |   0.292 μs |   1.00 |    0.00 |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 1024       | 4,289.43 μs | 85.522 μs | 142.889 μs |   1.00 |    0.00 |
| WriteStream | .NET 7.0 | .NET 7.0 | 4096        | 1024       | 4,353.85 μs | 85.278 μs | 166.328 μs |   1.00 |    0.00 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 1024       | 4,416.57 μs | 87.109 μs | 196.619 μs | 154.81 |    5.78 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 1024       | 4,466.22 μs | 87.752 μs | 173.214 μs |   1.05 |    0.05 |
| WriteStream | .NET 8.0 | .NET 8.0 | 4096        | 1024       | 4,204.54 μs | 75.218 μs | 110.254 μs |   0.97 |    0.05 |


| Method      | Job      | Runtime  | PayloadSize | BufferSize | Mean        | Error     | StdDev     | Ratio  | RatioSD |
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 4096       |    27.75 μs |  0.550 μs |   1.184 μs |   1.00 |    0.00 |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 4096       |    28.15 μs |  0.550 μs |   0.949 μs |   1.00 |    0.00 |
| WriteStream | .NET 7.0 | .NET 7.0 | 4096        | 4096       | 4,257.61 μs | 83.921 μs | 133.108 μs |   1.00 |    0.00 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 4096       | 4,298.04 μs | 85.603 μs | 149.926 μs | 155.93 |   10.91 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 4096       | 4,416.73 μs | 84.027 μs | 210.808 μs | 159.07 |    9.01 |
| WriteStream | .NET 8.0 | .NET 8.0 | 4096        | 4096       | 4,431.81 μs | 88.581 μs | 198.124 μs |   1.04 |    0.05 |


| Method      | Job      | Runtime  | PayloadSize | BufferSize | Mean        | Error     | StdDev     | Ratio  | RatioSD |
| WriteStream | .NET 7.0 | .NET 7.0 | 512         | 8192       |    28.50 μs |  0.352 μs |   0.329 μs |   1.00 |    0.00 |
| WriteStream | .NET 7.0 | .NET 7.0 | 1024        | 8192       |    29.14 μs |  0.257 μs |   0.241 μs |   1.00 |    0.00 |
| WriteStream | .NET 7.0 | .NET 7.0 | 4096        | 8192       |    28.78 μs |  0.148 μs |   0.138 μs |   1.00 |    0.00 |
|             |          |          |             |            |             |           |            |        |         |
| WriteStream | .NET 8.0 | .NET 8.0 | 512         | 8192       | 4,423.72 μs | 87.476 μs | 225.804 μs | 150.71 |    5.80 |
| WriteStream | .NET 8.0 | .NET 8.0 | 1024        | 8192       | 4,597.65 μs | 86.784 μs | 209.593 μs | 158.45 |    9.97 |
| WriteStream | .NET 8.0 | .NET 8.0 | 4096        | 8192       | 4,187.94 μs | 77.722 μs |  68.898 μs | 145.63 |    2.46 |

Since all writes would fit inside a single block, it's not strange to me that there isn't any noticeable difference in the time it takes to write the data.

@stephentoub
Copy link
Member

stephentoub commented Mar 25, 2024

If I've understood your report correctly, this is by design. The purpose of Flush(true) is to flush to disk, but it wasn't in some cases in .NET 6/7. #77384 fixed that.

@jeme
Copy link

jeme commented Mar 25, 2024

@stephentoub Y, I was actually beginning to wonder if this was actually due to a bug fix.

Lets link the bug as well: #77373

Considering dispose also happens, I assume the data would still end up on the disk with .NET 7.0, but the penalty was just paid asynchronously later, was that by the disk (e.g. we are just writing to the disks buffers but not ensuring that it flushes those) or was it something underlying in the OS? (unless a catastrophic crash was encountered, which would leave the file "corrupt" I assume, which in turn is why this fix is actually important).

@stephentoub
Copy link
Member

FileStream.Dispose ensures that anything buffered in the FileStream's byte[] buffer would be written to the OS, but it doesn't guarantee that the OS will turn have flushed everything to persisted storage.

@stephentoub
Copy link
Member

cc: @adamsitnik

@MichalPetryka
Copy link
Contributor

FileStream.Dispose ensures that anything buffered in the FileStream's byte[] buffer would be written to the OS, but it doesn't guarantee that the OS will turn have flushed everything to persisted storage.

So it corresponds to FileStream.Flush(false) then, right?

@stephentoub
Copy link
Member

FileStream.Dispose ensures that anything buffered in the FileStream's byte[] buffer would be written to the OS, but it doesn't guarantee that the OS will turn have flushed everything to persisted storage.

So it corresponds to FileStream.Flush(false) then, right?

Yes.


public override void Flush() => Flush(flushToDisk: false);

@AddictedCS
Copy link
Author

I am closing this issue. The bug fix introduced the performance penalty, but it is by design.

@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Mar 27, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Apr 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants