Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileStream file preallocation performance #45946

Closed
dmex opened this issue Dec 11, 2020 · 22 comments · Fixed by #51111
Closed

FileStream file preallocation performance #45946

dmex opened this issue Dec 11, 2020 · 22 comments · Fixed by #51111
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.IO tenet-performance Performance related issue
Milestone

Comments

@dmex
Copy link

dmex commented Dec 11, 2020

Edit by carlossanlop: API Proposal can be found here.


Description

The FileStream class doesn't currently support Windows file preallocation hints reducing the performance of file operations while also increasing the disk fragmentation of newly created files.

If we use the example for file uploads for aspnetcore here:
https://docs.microsoft.com/en-us/aspnet/core/mvc/models/file-uploads?view=aspnetcore-5.0#file-upload-scenarios

using (var stream = System.IO.File.Create(filePath))
{
     await formFile.CopyToAsync(stream);
}

None of the .NET classes support passing the AllocationSize when creating the file even though it's been a feature included with Windows since 2000. When you're creating multiple files with large file sizes such as with servers and file uploads, installers or build servers (we use .netcore for our builds servers compiling native C) passing the file length as AllocationSize when creating the file can significantly reduce fragmentation and improve performance.

An example of passing the allocation size:

using (var stream = System.IO.File.Create(filePath, allocationSize: 1073741824))
{
     await formFile.CopyToAsync(stream);
}

Windows Vista and above support the creation of files with their initial AllocationSize. This hint is passed to the file system driver and we can use the published FAT driver source code to show these optimizations:
https://github.com/microsoft/Windows-driver-samples/blob/master/filesys/fastfat/allocsup.c#L1164
https://github.com/microsoft/Windows-driver-samples/blob/master/filesys/fastfat/allocsup.c#L1233-L1250

The FatTruncateFileAllocation procedure becomes a noop when the AllocationSize is valid:
https://github.com/microsoft/Windows-driver-samples/blob/master/filesys/fastfat/allocsup.c#L1533

You can search for AllocationSize and see the other optimizations when the value is known during file creation:
https://github.com/microsoft/Windows-driver-samples/blob/6c1981b8504329521343ad00f32daa847fa6083a/filesys/fastfat/create.c#L6493-L6506

It would be safe to assume similar optimisations in the NTFS and ReFS drivers when AllocationSize is valid.

Configuration

The AllocationSize must be passed with NtCreateFile when a file is being created, overwritten, or superseded. NtCreateFile is documented here: https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile

NTSTATUS
NTAPI
NtCreateFile(
    _Out_ PHANDLE FileHandle,
    _In_ ACCESS_MASK DesiredAccess,
    _In_ POBJECT_ATTRIBUTES ObjectAttributes,
    _Out_ PIO_STATUS_BLOCK IoStatusBlock,
    _In_opt_ PLARGE_INTEGER AllocationSize, // <---- File size
    _In_ ULONG FileAttributes,
    _In_ ULONG ShareAccess,
    _In_ ULONG CreateDisposition,
    _In_ ULONG CreateOptions,
    _In_reads_bytes_opt_(EaLength) PVOID EaBuffer,
    _In_ ULONG EaLength
    );

FileStream should also support get/set for the file allocation size like it currently does for the file size using GetFileInformationByHandleEx with the FileAllocationInfo class and FILE_ALLOCATION_INFO however this doesn't receive the full benefits of passing the AllocationSize up front with NtCreateFile so both methods should be supported for different use cases.

Analysis

A reduction in IO for our workloads when using C# to create files:
image

Data

There's also a discussion on stackoverflow about preallocation with more details:
https://stackoverflow.com/questions/53334343/windows-refs-ntfs-file-preallocation-hint

.NET including support for preallocation hints would be very welcome feature for reducing disk fragmentation on servers (file uploads) and desktop applications (installers and build tools). Please consider adding support for this feature into the next version of the runtime.

@dmex dmex added the tenet-performance Performance related issue label Dec 11, 2020
@iSazonov
Copy link
Contributor

I like it on Windows but what are the benefits on Unix?

@danmoseley
Copy link
Member

Thanks for the suggestion. It would be interesting for someone to propose the API changes. (using template but in this issue)

@danmoseley danmoseley added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 17, 2020
@dmex
Copy link
Author

dmex commented Dec 21, 2020

@danmosemsft

I would propose adding an extra parameter to the FileSteam class so optimizations can be made throughout the framework:

- public FileStream(String path, FileMode mode, FileAccess access, FileShare share, int bufferSize)  {
+ public FileStream(String path, FileMode mode, FileAccess access, FileShare share, int bufferSize, long allocationSize) 

For example the FileSystem.CopyFile function:

// Copy the contents of the file from the source to the destination, creating the destination in the process
using (var src = new FileStream(sourceFullPath, FileMode.Open, FileAccess.Read, FileShare.Read, DefaultBufferSize, FileOptions.None))
using (var dst = new FileStream(destFullPath, overwrite ? FileMode.Create : FileMode.CreateNew, FileAccess.ReadWrite, FileShare.None, DefaultBufferSize, FileOptions.None))

Can be changed to pass src.Length into dst :

using (var src = new FileStream(sourceFullPath, FileMode.Open, FileAccess.Read, FileShare.Read, DefaultBufferSize, FileOptions.None))
using (var dst = new FileStream(
          destFullPath, 
          overwrite ? FileMode.Create : FileMode.CreateNew, 
          FileAccess.ReadWrite, 
          FileShare.None, 
          DefaultBufferSize, 
          src.Length, // <-- value passed to fallocate and/or NtCreateFile as the AllocationSize
          FileOptions.None))
{
    Interop.CheckIo(Interop.Sys.CopyFile(src.SafeFileHandle, dst.SafeFileHandle));
}

File uploads via ASP.NET could also be optimized: (this uses the sample code from msdn)

using (var stream = System.IO.File.Create(filePath, allocationSize: formFile.Length)) // <-- formFile.Length passed to fallocate and/or NtCreateFile
{
     await formFile.CopyToAsync(stream);
}

@iSazonov

I like it on Windows but what are the benefits on Unix?

Linux/Unix has disk fragmentation just like Windows but most linux/unix tools are using the fallocate syscall so its not really an issue but .NET doesn't use fallocate and so the performance of .NET is at a disadvantage on Linux compared to other tools.

.NET should include support for fallocate on Linux/Unix:
fallocate: https://man7.org/linux/man-pages/man2/fallocate.2.html
posix_fallocate: https://man7.org/linux/man-pages/man3/posix_fallocate.3.html

You can view the fragmentation using the fsck command:
fsck -fnv /dev/sdb

image

@iSazonov
Copy link
Contributor

@dmex Under Unix I assume not only Linux. We need a consistency behavior on all supported platforms including MacOs, FreeBSD. So we need a workaround for Linux fallocate on the platforms.

long allocationSize

preallocationSize name looks better for me.

@dmex
Copy link
Author

dmex commented Dec 22, 2020

We need a consistency behavior on all supported platforms including MacOs, FreeBSD. So we need a workaround for Linux fallocate on the platforms.

BSD has posix_fallocate (BSD documentation) and OSX has fcntl with the F_PREALLOCATE (OSX documentation) flag.

There isn't much documentation on OSX about F_PREALLOCATE but there's an example here: https://lists.apple.com/archives/darwin-dev/2007/Dec/msg00040.html

preallocationSize name looks better for me.

preallocationSize would probably be more consistent and a lot better at conveying the meaning.

@adamsitnik
Copy link
Member

@dmex I've added it to #40359 and we are going to try to fix it for .NET 6.0

@adamsitnik
Copy link
Member

adamsitnik commented Mar 25, 2021

Background and Motivation

Specifying file allocation size when creating the file can improve:

  • performance as the following writes would not need to extend the file, but just write the content
  • reliability as copying files could fail as soon as we know there is not enough space on the disk when creating the file, not after we copy a part of the file and find out there is not enough space for the rest of it

Proposed API

namespace System.IO
{
    public class FileStream : Stream
    {
        public FileStream(string path, FileMode mode)
        public FileStream(string path, FileMode mode, FileAccess access)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, bool useAsync)
        public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options)
+       public FileStream(string path, FileMode mode, FileAccess access, FileShare share, int bufferSize, FileOptions options, long allocationSize)        
    }
    
    public partial class StreamWriter : System.IO.TextWriter
    {
        public StreamWriter(Stream stream)
        public StreamWriter(Stream stream, System.Text.Encoding encoding)
        public StreamWriter(Stream stream, System.Text.Encoding encoding, int bufferSize)
        public StreamWriter(Stream stream, System.Text.Encoding? encoding = null, int bufferSize = -1, bool leaveOpen = false)
        public StreamWriter(string path)
        public StreamWriter(string path, bool append)
        public StreamWriter(string path, bool append, System.Text.Encoding encoding)
+       public StreamWriter(string path, bool append = false, System.Text.Encoding? encoding, int bufferSize = -1, FileOptions options = FileOptions.None, long allocationSize = -1)
    }
    
    public static partial class File
    {
        public static FileStream Create(string path)
        public static FileStream Create(string path, int bufferSize) 
        public static FileStream Create(string path, int bufferSize, FileOptions options)
+       public static FileStream Create(string path, int bufferSize, FileOptions options, long allocationSize)
    }
    
    public sealed partial class FileInfo : System.IO.FileSystemInfo
    {
        public FileInfo(string fileName)
        public System.IO.FileStream Create()
+       public System.IO.FileStream Create(FileOptions options, long allocationSize)
    }
 }

Note: Some of the methods above add FileOptions options to the arguments list. This has been done to avoid conflict with #24698 which is going to hopefully add them anyway.

Usage Examples

using (FileStream source = File.Open(sourceFilePath))
using (FileStream destination = File.Create(destinationFilePath, allocationSize: source.Length))
{
     source.CopyTo(destination);
}

Alternative Designs

The proposed design is so simple that it's hard to come up with an alternative.

Risks

The API itself does not introduce breaking changes, but using it in all *File*.Copy* and File.WriteAll* methods can potentially introduce breaking changes if we don't pay attention to what exception was thrown when we were running out of disk space so far.

@adamsitnik adamsitnik self-assigned this Mar 25, 2021
@adamsitnik adamsitnik added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 25, 2021
@carlossanlop
Copy link
Member

@adamsitnik I left a comment in the other API proposal that is relevant to this proposal (wrapping new arguments in a new type, rather than adding new parameters to existing APIs).

@terrajobst terrajobst added the blocking Marks issues that we want to fast track in order to unblock other important work label Mar 29, 2021
@bartonjs
Copy link
Member

bartonjs commented Mar 30, 2021

Video

  • Some of the existing methods are getting really long signatures, let's start defaulting these.
  • We asked if we really needed/wanted all 4, and it seemed like a reasonable thing to do.
namespace System.IO
{
    public class FileStream : Stream
    {
        public FileStream(string path, FileMode mode, FileAccess access = appropriate default, FileShare share = appropriate default, int bufferSize = appropriate default, FileOptions options = appropriate default, long allocationSize = appropriate default)
    }
    
    public partial class StreamWriter : System.IO.TextWriter
    {
        public StreamWriter(string path, bool append = false, System.Text.Encoding? encoding, int bufferSize = -1, FileOptions options = FileOptions.None, long allocationSize = -1)
    }
    
    public static partial class File
    {
        public static FileStream Create(string path, int bufferSize = appropriate default, FileOptions options = appropriate default, long allocationSize = appropriate default)
    }
    
    public sealed partial class FileInfo : System.IO.FileSystemInfo
    {
        public System.IO.FileStream Create(FileOptions options = appropriate default, long allocationSize = appropriate default)
    }
 }

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Mar 30, 2021
@stephentoub
Copy link
Member

This is the second issue that just approved making the overloads of these methods even longer. Did we discuss adding options bags and then a single overload taking that bag? (Had conflicts today and couldn't make the API reviews.)

@bartonjs
Copy link
Member

The question came up. Partially it was that no one had a good name.

3 of the 4 of these are building on members newly added from #24698. Hopefully they're just added as the one member with defaults, which really makes this just adding one new member, not 4.

@stephentoub
Copy link
Member

stephentoub commented Mar 30, 2021

3 of the 4 of these are building on members newly added from #24698

Right, I have the same concerns there, adding 27 (!) new overloads.

I'm surprised these were approved.

@danmoseley
Copy link
Member

As well as being more overloads to select from, the longer set of parameters makes it more daunting to use the constructor, because there is no longer a clear progression of increasingly less used parameters, but instead you may need to comma past several that you want to keep as default in order to specify a value for the one you care about; and the code also becomes harder to read (eg., wider).

Re naming, it seems we have a fair bit of precedent for XXOptions, as well as ProcessStartInfo.

@bartonjs
Copy link
Member

Re naming, it seems we have a fair bit of precedent for XXOptions,

Yeah, but "FileOptions" is already the name of one of the parameter types (a flags enum). Already used the good name 😄

@dmex
Copy link
Author

dmex commented Mar 30, 2021

A few name suggestions... FileStreamOptions, FileStreamExtendedOptions, FileExtendedParameters

The last one would probably be the most descriptive.

@danmoseley
Copy link
Member

danmoseley commented Mar 30, 2021

Yeah, but "FileOptions" is already the name of one of the parameter types (a flags enum). Already used the good name 😄

I figured FileStreamOptions...

Maybe we should have an analyzer in our SDK that enforces some kind of heuristic limit on # constructors x # parameters 🙂

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 12, 2021
@saucecontrol
Copy link
Member

I figured FileStreamOptions...

With the recent overhaul around FileStreamStrategy and buffering and what-not, I was thinking it would be nice to see an option to have ephemeral FileStreams grab their buffer from ArrayPool instead of allocating a new one. I haven't opened an API proposal for that because I assume there's no way another set of overloads would pass review, but having an Options class to stuff things into would change the dynamic, no?

@stephentoub stephentoub removed the blocking Marks issues that we want to fast track in order to unblock other important work label Apr 15, 2021
@adamsitnik
Copy link
Member

I've created a new proposal for the option bag: #52446

@svick
Copy link
Contributor

svick commented May 7, 2021

Is the name allocationSize sufficiently clear? Are people going to understand that it's referring to the size of the file on disk and not to to the size of some buffer used by the stream? I think changing the name to something like fileAllocationSize or initialFileSize would be better. (The name preallocationSize was suggested in #45946 (comment), but I'm not sure if that helps in this regard.)

@dmex
Copy link
Author

dmex commented May 7, 2021

Is the name allocationSize sufficiently clear?

I thought it would be preallocationSize?

The usage example created by @adamsitnik in the proposal #52446 uses a fixed preallocation hint which isn't the best usage example: AllocationSize = 1024 * 1024,

You generally only use preallocation with a known length:

// Opening file for Read:
var basic = new FileStreamOptions
{
    Path = @"C:\FrameworkDesignGuidelines.pdf",
    Mode = FileMode.Open,
    Access = FileAccess.Read,
};
var read = new FileStream(basic);

var advanced = new FileStreamOptions
{
    Path = @"C:\Copy.pdf",
    Mode = FileMode.CreateNew,
    Access = FileAccess.Write,
    Share = FileShare.None,
    AllocationSize = read.Length // <-- Pass length as preallocation hint
};
var write = new FileStream(advanced);

read.CopyTo(write);

AllocationSize would confuse some to think it's a buffer allocation rather than the physical file allocation preformed by the disk driver.

@adamsitnik
Copy link
Member

Are people going to understand that it's referring to the size of the file on disk and not to to the size of some buffer used by the stream?

In the main proposal of #52446 users can't provide bufferSize anymore so I hope that it would not be too confusing.

I think changing the name to something like fileAllocationSize

I wanted to avoid file prefix on purpose, as it's already contained in the name of the type (FileStreamOptions)

initialFileSize would be better

The problem is that it's actually not the file size. If you specify the allocationSize to A but write only B bytes where B < A the file size is going to be equal to B on Windows and A on Unix ;)

I thought it would be preallocationSize?

I like this name, I am going to update the proposal

My other idea is guaranteedDiskSpace as this argument ensures that disk space is allocated. @svick @dmex how does it sound to you?

@dmex thanks for the improved example, I've added it to the proposal

@dmex
Copy link
Author

dmex commented May 11, 2021

My other idea is guaranteedDiskSpace as this argument ensures that disk space is allocated. @svick @dmex how does it sound to you?

Either name works I guess.... I like preallocationSize since it's consistent with native implementations and helps when refering to documentation used elsewhere but then again most use-cases are going to be 'guaranteeing disk space' exists I guess. Either name works 👍

System.IO - FileStream automation moved this from To do to Done May 18, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 18, 2021
@dotnet dotnet locked as resolved and limited conversation to collaborators Jun 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.IO tenet-performance Performance related issue
Projects
Development

Successfully merging a pull request may close this issue.