Fix reordering/reentrancy bug in `NIOAsyncWriter` + `NIOAsyncChannel` #2587

FranzBusch · 2023-11-06T11:21:43Z

Motivation

While testing the latest async interfaces we found a potential reordering/reentrancy bug in the NIOAsyncWriter. This was caused due to our latest performance changes where we fast-pathed calls in didYield to not hop. The problem was in the following flow:

Task 1: Calls outbound.write() -> which led an EventLoop enqueue with didYield
Task 1: Calls outbound.write() -> which led an EventLoop enqueue with didYield
EventLoop: While processing the write from 1. the channel became not writable
Task 1: Calls outbound.write() -> which lead to buffering the write in the writer's state machine since we are not writable
EventLoop: While still processing the write from 1. the channel became writable again -> We informed the NIOAsyncWriter about this which unbuffered the write in 4. that was stored in the state machine and call didYield. Since, we are on the EventLoop already we processed the write right away

The above flow show-cases a flow where we reordered the write in 2. and 4.

Modification

This PR fixes the above issue while upholding a few constraints:

Produce as few context switches as possible
Minimize allocations

I tried different approaches but in the end decided to do the following:

Make sure to never call didYield/didTerminate from calls on the NIOAsyncWriter.Sink
Don't coalesce the elements of different writes in the NIOAsyncWriter but rather use the suspended tasks to retry a write after they were suspended. I choose to do this since I wanted to avoid any allocation (remember writers are some Sequence) and because we assume that continuous contention in a multi producer pattern is low.
Make sure that Sink.finish() is terminal and does not lead to a didTerminate event. This is in line with 1.

One important thing to call out, our writer.finish() method is not async we have to buffer the finish event and deliver it with the yield that got buffered before we transitioned to writerFinished.

Result

No more reordering/reentrancy problems in NIOAsyncWriter or NIOAsyncChannel.

glbrntt · 2023-11-06T12:51:56Z

Sources/NIOCore/AsyncSequences/NIOAsyncWriter.swift

    ///
-    /// - Note: This is guaranteed to be called _exactly_ once.
+    /// - Note: This is guaranteed to be called _at most_ once.


I'm a bit concerned about this change in behaviour, users could be relying on it as a signal to e.g. tear some things down.

So the behaviour here changed that we only call didTerminate when somebody called writer.finished and not on sink.finished anymore. The logic before we quite weird and it caused problems with overlapping exclusive access that I fixed in previous PRs. Now the logic is that a call to didTerminate should result in the underlying resource to trigger termination and then a call to sink.didFinish indicates that this is done.
I am on the fence with this change so happy to discuss this more.

Do you think anyone outside of NIO actually implements this? Might be worth having a quick search.

I just did a quick GH search and couldn't find anyone besides grpc-swift using this and there we are only relying on the didTerminate being called from writer.finish and not from sink.finish which is the intention of this PR.

glbrntt · 2023-11-06T12:54:46Z

Sources/NIOCore/AsyncSequences/NIOAsyncWriter.swift

+
+                    case .retry:
+                        // The yield has to be retried. We are recursively calling yield here
+                        // but don't expect to blow the stack.


Why do we think this won't happen?

I changed this to a while loop now

Sources/NIOCore/AsyncSequences/NIOAsyncWriter.swift

glbrntt · 2023-11-06T13:18:39Z

Sources/NIOCore/AsyncSequences/NIOAsyncWriter.swift

+                inDelegateOutcall: Bool,
+                suspendedYields: _TinyArray<SuspendedYield>,
+                cancelledYields: [YieldID],
+                knownYieldIDs: _TinyArray<YieldID>,


What are the known yield IDs?

I added a comment to clarify this but they are the buffered yields before we got the finish.

Tests/NIOCoreTests/AsyncSequences/NIOAsyncWriterTests.swift

# Motivation While testing the latest async interfaces we found a potential reordering/reentrancy bug in the `NIOAsyncWriter`. This was caused due to our latest performance changes where we fast-pathed calls in `didYield` to not hop. The problem was in the following flow: 1. Task 1: Calls `outbound.write()` -> which led an `EventLoop` enqueue with `didYield` 2. Task 1: Calls `outbound.write()` -> which led an `EventLoop` enqueue with `didYield` 3. EventLoop: While processing the write from 1. the channel became **not** writable 4. Task 1: Calls `outbound.write()` -> which lead to buffering the write in the writer's state machine since we are **not** writable 5. EventLoop: While still processing the write from 1. the channel became writable again -> We informed the `NIOAsyncWriter` about this which unbuffered the write in 4. that was stored in the state machine and call `didYield`. Since, we are on the EventLoop already we processed the write right away The above flow show-cases a flow where we reordered the write in 2. and 4. # Modification This PR fixes the above issue while upholding a few constraints: 1. Produce as few context switches as possible 2. Minimize allocations I tried different approaches but in the end decided to do the following: 1. Make sure to never call `didYield/didTerminate` from calls on the `NIOAsyncWriter.Sink` 2. Don't coalesce the elements of different writes in the `NIOAsyncWriter` but rather use the suspended tasks to retry a write after they were suspended. I choose to do this since I wanted to avoid any allocation (remember writers are `some Sequence`) and because we assume that continuous contention in a multi producer pattern is low. 3. Make sure that `Sink.finish()` is terminal and does not lead to a `didTerminate` event. This is in line with 1. One important thing to call out, our `writer.finish()` method is not `async` we have to buffer the finish event and deliver it with the yield that got buffered before we transitioned to `writerFinished`. # Result No more reordering/reentrancy problems in `NIOAsyncWriter` or `NIOAsyncChannel`.

glbrntt · 2023-11-07T10:41:25Z

Sources/NIOCore/AsyncSequences/NIOAsyncWriter.swift

    ///
-    /// - Note: This is guaranteed to be called _exactly_ once.
+    /// - Note: This is guaranteed to be called _at most_ once.


Do you think anyone outside of NIO actually implements this? Might be worth having a quick search.

glbrntt · 2023-11-07T10:45:13Z

Sources/NIOCore/AsyncSequences/NIOAsyncWriter.swift


                switch action {
                case .callDidYield(let delegate):
+                    // We are allocating a new Deque for every write here


Ouch. I didn't realise we did this.

Yeah this one of the early design choices because users can yield us any kind of sequence and we have to normalise this in the Deque case. Right now we have provided a fast path for a single element we ought to be able to do the same for Deque/Array and more.

FranzBusch added the patch-version-bump-only For PRs that when merged will only cause a bump of the patch version, ie. 1.0.x -> 1.0.(x+1) label Nov 6, 2023

FranzBusch requested review from Lukasa and glbrntt November 6, 2023 11:21

FranzBusch force-pushed the fb-async-writer-reentrancy branch from 3609bb1 to b0e3e99 Compare November 6, 2023 11:23

glbrntt reviewed Nov 6, 2023

View reviewed changes

FranzBusch added 2 commits November 6, 2023 15:35

Code review

09c595c

FranzBusch force-pushed the fb-async-writer-reentrancy branch from 05e8ce4 to 09c595c Compare November 6, 2023 15:35

glbrntt approved these changes Nov 7, 2023

View reviewed changes

FranzBusch merged commit 035141d into apple:main Nov 8, 2023
7 of 8 checks passed

FranzBusch deleted the fb-async-writer-reentrancy branch November 8, 2023 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix reordering/reentrancy bug in `NIOAsyncWriter` + `NIOAsyncChannel` #2587

Fix reordering/reentrancy bug in `NIOAsyncWriter` + `NIOAsyncChannel` #2587

FranzBusch commented Nov 6, 2023

glbrntt Nov 6, 2023

FranzBusch Nov 6, 2023

glbrntt Nov 7, 2023

FranzBusch Nov 8, 2023

glbrntt Nov 6, 2023

FranzBusch Nov 6, 2023

glbrntt Nov 6, 2023

FranzBusch Nov 6, 2023

glbrntt Nov 7, 2023

glbrntt Nov 7, 2023

FranzBusch Nov 8, 2023

Fix reordering/reentrancy bug in NIOAsyncWriter + NIOAsyncChannel #2587

Fix reordering/reentrancy bug in NIOAsyncWriter + NIOAsyncChannel #2587

Conversation

FranzBusch commented Nov 6, 2023

Motivation

Modification

Result

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fix reordering/reentrancy bug in `NIOAsyncWriter` + `NIOAsyncChannel` #2587

Fix reordering/reentrancy bug in `NIOAsyncWriter` + `NIOAsyncChannel` #2587