Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for atomic appends to end of file #53432

Closed
adamsitnik opened this issue May 28, 2021 · 20 comments
Closed

Allow for atomic appends to end of file #53432

adamsitnik opened this issue May 28, 2021 · 20 comments
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.IO
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented May 28, 2021

Link to proposal: #53432 (comment)

Discussion

While working on a new API called File.OpenHandle I've realized that FileStream currently does not support handles that have been opened for appending (FILE_APPEND_DATA on Windows and O_APPEND on Unix).

Moreover, we don't use these flags for FileMode.Append when we are opening a file from a path. Instead, we mimic the append by setting the file offset to the end of the file:

if (mode == FileMode.Append)
{
_appendStart = _filePosition = Length;
}

if (mode == FileMode.Append)
{
// Jump to the end of the file if opened as Append.
_appendStart = SeekCore(_fileHandle, 0, SeekOrigin.End);
}

if (mode == FileMode.Append)
{
_appendStart = SeekCore(_fileHandle, 0, SeekOrigin.End);
}

I did some quick web search and it turns out that it was possible with .NET Framework: https://stackoverflow.com/questions/1862309/how-can-i-do-an-atomic-write-append-in-c-or-how-do-i-get-files-opened-with-the

And some of our users (@nblumhardt) are missing it in .NET Core: https://nblumhardt.com/2016/08/atomic-shared-log-file-writes/

@stephentoub @ericstj @JeremyKuhne is there any chance that you remember why we have decided to implement it in such a way?

cc @Jozkee @carlossanlop

@ghost
Copy link

ghost commented May 28, 2021

Tagging subscribers to this area: @carlossanlop
See info in area-owners.md if you want to be subscribed.

Issue Details

While working on a new API called File.OpenHandle I've realized that FileStream currently does not support handles that have been opened for appending (FILE_APPEND_DATA on Windows and O_APPEND on Unix).

Moreover, we don't use these flags for FileMode.Append when we are opening a file from a path. Instead, we mimic the append by setting the file offset to the end of the file:

if (mode == FileMode.Append)
{
_appendStart = _filePosition = Length;
}

if (mode == FileMode.Append)
{
// Jump to the end of the file if opened as Append.
_appendStart = SeekCore(_fileHandle, 0, SeekOrigin.End);
}

if (mode == FileMode.Append)
{
_appendStart = SeekCore(_fileHandle, 0, SeekOrigin.End);
}

I did some quick web search and it turns out that it was possible with .NET Framework: https://stackoverflow.com/questions/1862309/how-can-i-do-an-atomic-write-append-in-c-or-how-do-i-get-files-opened-with-the

And some of our users (@nblumhardt) are missing it in .NET Core: https://nblumhardt.com/2016/08/atomic-shared-log-file-writes/

@stephentoub @ericstj @JeremyKuhne is there any chance that you remember why we have decided to implement it in such a way?

cc @Jozkee @carlossanlop

Author: adamsitnik
Assignees: -
Labels:

Discussion, area-System.IO

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label May 28, 2021
@stephentoub
Copy link
Member

stephentoub commented May 28, 2021

Moreover, we don't use these flags for FileMode.Append when we are opening a file from a path. Instead, we mimic the append by setting the file offset to the end of the file:

.NET Framework does the same, for the constructors that exist in both places.

.NET Framework also has ctors that accept a FileSystemRights, and that rights information (which has FileSystemRights.AppendData that maps to FILE_APPEND_DATA) is passed down into the CreateFile call. I assume we didn't add those constructors for layering reasons. But .NET Core does have FileSystemAclExtensions.Create which lets you create a FileStream with a FileSystemRights.

So, the same APIs do the same thing in both places. And the additional functionality is available in both places, just with a ctor in netfx and a Create method in core.

@ericstj
Copy link
Member

ericstj commented May 28, 2021

@stephentoub is correct. In general we were focused on porting and maintaining behavior early on in .NETCore. Portability and compatibility was very important. We weren't making behavior changes if we could avoid it. ACLs as a whole were not in because they didn't work well cross-plat. @mjrousos initially did a port of all the ACL APIs for Test.NET (an internal set of library implementations that were developed in order to port more test code) and that eventually became the *AccessControl packages in the compat pack.

I've had some discussions with @terrajobst and @ViktorHofer lately about adding ACLs back. Now that we have the platform support features to warn people about unsupported API there's less of a downside to do this. Initially we were planning to just add all the *AccessControl packages back to the framework. Hypothetically we can add back the framework API (with SupportedOS attributes) after that, though we'd need to decide if we wanted to put platform specific APIs on our core types. That's a separate discussion though.

I am not aware of anything fundamentally blocking the addition of new APIs to support Append, or enabling it through some existing API.

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Jul 1, 2021
@adamsitnik adamsitnik added this to the 7.0.0 milestone Jul 1, 2021
@adamsitnik adamsitnik self-assigned this Jul 1, 2021
@ghost ghost added in-pr There is an active PR which will close this issue when it is merged and removed in-pr There is an active PR which will close this issue when it is merged labels Jul 10, 2021
@snakefoot
Copy link

snakefoot commented Aug 17, 2021

@adamsitnik Has the merge of #50166 + #51111 + #52928 + #53669 + #55513 made it possible to perform atomic-file-append on Windows and Linux-platforms on Net6.0 ? (Allowing 2 processes to append concurrently to the same file)

According to old scriptures then one should set both FILE_APPEND_DATA + SYNCHRONIZE and BufferSize = 1 to enable atomic-append on Windows. But according to other sources then it seems FILE_APPEND_DATA or O_APPEND is enough.

@adamsitnik
Copy link
Member Author

@snakefoot #55465 was implementing the atomic append support, but it did not get merged as it would introduce a breaking change related to changing FileStream.Position (personally I don't understand why anyone would like to seek a file stream that was opened for appending to the end of file). I'll try to get back to this idea in .NET 7.

@snakefoot
Copy link

@adamsitnik Maybe it could be part of the GetNet5CompatFileStreamSetting, as it already adds fallback option for "breaking behavior with FileMode.Append".

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 19, 2021
@adamsitnik
Copy link
Member Author

@snakefoot @nblumhardt @iSazonov I've re-opened #55465, PTAL and let me know what do you think about #55465 (comment)

@iSazonov
Copy link
Contributor

#55465 is still locked for conversations so I leave my comment here.

I think the docs say "open file for writing and limit permissions only to add new data to end file". From this I'd expect we can do some append writes before closing the file.

@adamsitnik
Copy link
Member Author

#55465 is still locked for conversations so I leave my comment here.

@iSazonov please excuse me for that, the PR got closed a while ago and I've not noticed that it got locked by a bot

image

I've unlocked it.

@adamsitnik adamsitnik changed the title Is current FileMode.Append implementation optimal? Allow for atomic appends to end of file Aug 20, 2021
@adamsitnik
Copy link
Member Author

adamsitnik commented Aug 20, 2021

Background and motivation

As of today, it's currently impossible to perform atomic file appends with .NET.

When the file is being opened with existing FileMode.Append flag, we store the file length as current file offset and perform next writes starting from this offset.

There is no guarantee that the file length won't be modified in the meantime (EOF might change).

Example:

using System;
using System.IO;
using System.Text;
using System.Threading;

namespace atomicAppend
{
    class Program
    {
        static void Main()
        {
            const string FilePath = "logFile.txt";

            if (File.Exists(FilePath)) File.Delete(FilePath);

            Thread t1 = new Thread(Log), t2 = new Thread(Log);
            t1.Start(FilePath); t2.Start(FilePath);
            t1.Join(); t2.Join();
        }

        static void Log(object filePath)
        {
            using (FileStream fs = new FileStream((string)filePath, FileMode.Append, FileAccess.Write, FileShare.Write, 1, false))
            {
                for (int i = 0; i < 10; i++)
                {
                    fs.Write(Encoding.Unicode.GetBytes($"{DateTime.UtcNow}: thread {Thread.CurrentThread.ManagedThreadId} iteration {i}\n"));
                }
            }
        }
    }
}

Sample output:

8/20/2021 6:50:08 AM: thread 5 iteration 0
8/20/2021 6:50:08 AM: thread 5 iteration 1
8/20/2021 6:50:08 AM: thread 5 iteration 2
8/20/2021 6:50:08 AM: thread 5 iteration 3
8/20/2021 6:50:08 AM: thread 5 iteration 4
8/20/2021 6:50:08 AM: thread 5 iteration 5
8/20/2021 6:50:08 AM: thread 5 iteration 6
8/20/2021 6:50:08 AM: thread 5 iteration 7
8/20/2021 6:50:08 AM: thread 5 iteration 8
8/20/2021 6:50:08 AM: thread 5 iteration 9

As you can see, both threads were overwriting the content instead of appending it to end of file.

We would expect to have something like:

8/20/2021 6:50:16 AM: thread 4 iteration 0
8/20/2021 6:50:16 AM: thread 5 iteration 0
8/20/2021 6:50:16 AM: thread 4 iteration 1
8/20/2021 6:50:16 AM: thread 5 iteration 1
8/20/2021 6:50:16 AM: thread 4 iteration 2
8/20/2021 6:50:16 AM: thread 5 iteration 2
8/20/2021 6:50:16 AM: thread 4 iteration 3
8/20/2021 6:50:16 AM: thread 5 iteration 3
8/20/2021 6:50:16 AM: thread 4 iteration 4
8/20/2021 6:50:16 AM: thread 5 iteration 4
8/20/2021 6:50:16 AM: thread 4 iteration 5
8/20/2021 6:50:16 AM: thread 4 iteration 6
8/20/2021 6:50:16 AM: thread 4 iteration 7
8/20/2021 6:50:16 AM: thread 5 iteration 5
8/20/2021 6:50:16 AM: thread 4 iteration 8
8/20/2021 6:50:16 AM: thread 5 iteration 6
8/20/2021 6:50:16 AM: thread 4 iteration 9
8/20/2021 6:50:16 AM: thread 5 iteration 7
8/20/2021 6:50:16 AM: thread 5 iteration 8
8/20/2021 6:50:16 AM: thread 5 iteration 9

API Proposal

namespace System.IO
{
    public enum FileMode
    {
        Append = 6,
+       AppendAtomic = 7,
    }
}

The main difference with FileMode.Append:

  • using native OS capabilities to ensure atomic appends to end of file (O_APPEND and FILE_APPEND_DATA)
  • call to Seek and set_Position would always throw (atomic append is an append to end of file, not to given offset)
  • FileStream buffering disabled by default

API Usage

using (FileStream fs = new FileStream(filePath, FileMode.AtomicAppend, FileAccess.Write, FileShare.Write))
{
    for (int i = 0; i < 10; i++)
    {
        fs.Write(Encoding.Unicode.GetBytes($"{DateTime.UtcNow}: thread {Thread.CurrentThread.ManagedThreadId} iteration {i}\n"));
    }
}

Risks

I can't see any.

@adamsitnik adamsitnik added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Aug 20, 2021
@iSazonov
Copy link
Contributor

How will FileShare work with AppendAtomic?

@adamsitnik
Copy link
Member Author

How will FileShare work with AppendAtomic?

FileShare.Write would allow others to append to the same file as well. FileShare.Read would allow others for reading, while FileShare.None would not allow anyone else to use it. Does that answer your question?

@Tornhoof
Copy link
Contributor

Tornhoof commented Aug 20, 2021

If I understand that correctly, the "normal" assumption is that Append would be atomic?
Maybe make this a breaking change (and add "AppendNonAtomic" instead) or add a roslyn analyzer which warns about Append without atomic?

I admit that a breaking change would be rather harsh if you throw exceptions for seek etc., as probably every logger in existence will then break.

@adamsitnik
Copy link
Member Author

If I understand that correctly, the "normal" assumption is that Append would be atomic?

It would be doable (#55465), but at a cost of introducing a breaking change to FileStream.Seek (we would not be able to set the position before current EOF). More context: #55465 (comment)

@iSazonov
Copy link
Contributor

My question was tricky :-). I am thinking if the weird behavior with Append comes from multi-threaded scenario we could consider add new option in FileShare enum so that lock other threads/processes if needed.

@omariom
Copy link
Contributor

omariom commented Aug 20, 2021

FileStream buffering disabled by default

Will it throw if the buffer size is specified?

@adamsitnik
Copy link
Member Author

Will it throw if the buffer size is specified?

Throw or be ignored.

The problem with buffering is that it might split large messages into chunks, and with other producers writing to the same file it may produce unexpected results.

bufferSize = 2
thread_1: fileStream.Write("AABB"); // translated to two separate writes: AA and BB
thread_2: fileStream.Write("c"); // might happen between the two writes above
possible outputs: AABBc, AAcBB, cAABB
bufferSize = 1
thread_1: fileStream.Write("AABB"); // single write (Unix*)
thread_2: fileStream.Write("c");  // single write
possible output: AABBc

(Unix*) - would need to check write behavior on UNIX for files opend with O_APPEND when the write operation is interrupted

Users also seem to not want to have buffering enabled for atomic appends: https://stackoverflow.com/a/5504226/5852046

@adamsitnik
Copy link
Member Author

I am thinking if the weird behavior with Append comes from multi-threaded scenario we could consider add new option in FileShare enum so that lock other threads/processes if needed.

There is no need for that, every OS exposes a feature that allows for thread-safe atomic appends to end of file. In #55465 I've implemented that by:

  • Windows: specyfing both the Offset and OffsetHigh members of the OVERLAPPED structure as 0xFFFFFFFF (docs)
  • Unix: using O_APPEND when opening file and then using write() (not pwritev) for writing

@iSazonov
Copy link
Contributor

There is no need for that, every OS exposes a feature that allows for thread-safe atomic appends to end of file.

Yes, but it is implementation details and my thoughts more about .Net behavior, e.g. Append with new FileShare option could expose new atomic append functionality without breaking change. I'm not sure we can benefit from it, but I had to mention it.

@terrajobst terrajobst added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Aug 24, 2021
@terrajobst
Copy link
Member

terrajobst commented Aug 24, 2021

Video

  • We should measure the performance overhead and establish clarity on what we believe the "right behavior" is for new code. One option, that we shouldn't dismiss, is doing a behavior breaking change in .NET 7 and provide compat switch. However, that's only appropriate if we believe being broken is a fringe case and code should generally use the atomic append behavior. There are conceivable scenarios where someone might open multiple handles within the same process and the new behavior would regress performance.
  • Another middle ground option is to expose the new setting and change the behavior of higher level APIs, such as File.AppendLines. This way, we don't make the average person's code more complex while also giving the power users the control they need.
  • @adamsitnik will do some benchmarking and come back with findings.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Sep 19, 2021
System.IO - FileStream automation moved this from To do to Done Oct 12, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.IO
Projects
Development

Successfully merging a pull request may close this issue.

8 participants