Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix graceful connection shutdown #7345

Merged
merged 8 commits into from Nov 3, 2021

Conversation

ReubenBond
Copy link
Member

@ReubenBond ReubenBond commented Oct 21, 2021

This is a fix for #7337.
Essentially, there was a race during connection teardown in which the underlying buffers were being yanked from the incoming/outgoing processing loops and returned to the memory pool. The culprit is the call to {PipeReader,PipeWriter}.Completed()

The PR also brings DuplexPipeStream inline with ASP.NET Core's impl, since it's changed since we first forked it.

cc @halter73 @davidfowl - does this look about right to you? I'm explicitly calling Context.Abort() after the connection finishes its processing loops, since calling {PipeReader,PipeWriter}.Completed() isn't enough for the TLS case. Is there something else that we should be doing instead?

Microsoft Reviewers: Open in CodeFlow

@ReubenBond ReubenBond added this to the 3.5.1 milestone Oct 21, 2021
if (_processIncomingTask is not null && _processOutgoingTask is not null)
{
// Wait for the transport to signal that it's closed before disposing it.
await _transportConnectionClosed.Task;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The completion of this is signaled by a delegate registered to the Context.ConnectionClosed CancellationToken

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work. The ConnectionClosed token won't fire until everything is flushed unless the connection is aborted.

@@ -116,6 +118,8 @@ protected virtual async Task RunInternal()
_processIncomingTask = this.ProcessIncoming();
_processOutgoingTask = this.ProcessOutgoing();
await Task.WhenAll(_processIncomingTask, _processOutgoingTask);

Context.Abort();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for the processing loops to complete before calling ConnectionContext.Abort().

Is there a better way to signal that the underlying transport should begin closing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the processing tasks have completed here, that doesn't mean the transport is done flushing everything. Abort() could still cause truncation of the response at least with Kestrel's transports. Is Orleans using both Kestrel and its own copy of Kestrel's Socket transport? It might be worth looking into SocketConnectionContextFactory which got added in .NET 6 RC1.

Maybe you should be calling DisposeAsync() instead, but even that might be unnecessary since Kestrel will do this automatically when connection middleware completes. If this is always running as part of some sort of middleware, the middleware pipeline itself should probably take care of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Orleans using both Kestrel and its own copy of Kestrel's Socket transport?

Yes, it is. Thanks, we might be able to use SocketConnectionContextFactory once we add net6.0 to our targets

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does always run as a part of a middleware pipeline. The middleware was not exiting when TLS was added (eg,

), though. It worked fine without TLS.

{
try
{
_transport?.Output.CancelPendingFlush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be abortive? Why wouldn't this wait on any output backpressure? Note that because FlushAsync() will complete before everything is really flushed because of ResumeWriterThreshold. ConnectionContext.DisposeAsync() shouldn't complete until everything is fully flushed at the application layer. The OS does it's own similar trickery with completing flushes before everything is acked, but that's unavoidable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to use ConnectionContext.DisposeAsync() to signal that the processing loops should terminate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for graceful termination if you don't want output truncation.

Copy link
Member

@halter73 halter73 Oct 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood this. For the output loop, I expect that most of the time isn't spent waiting on flushing. If you are waiting on flushing and you need to immediately terminate the output loop even though this could cause truncation, just abort the connection. Otherwise, you should cancel this.outgoingMessages.Reader however this normally happens.

Calling CancelPendingRead() on the input is pretty common even in graceful shutdown scenarios. You shouldn't call ConnectionContext.DisposeAsync() until you're completely done with the Pipes.

@@ -167,7 +167,7 @@ public static async Task LogException(this Task task, ILogger logger, ErrorCode
}
catch (Exception exc)
{
var ignored = task.Exception; // Observe exception
_ = task.Exception; // Observe exception
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary? I thought awaiting the task and catching the exception was enough to observe it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, I didn't even look. VS just barked at me to not have an unused variable. You're right, it's not at all necessary to access the .Exception prop here.

@halter73
Copy link
Member

The PR also brings DuplexPipeStream inline with ASP.NET Core's impl, since it's changed since we first forked it.

Seems like a good idea. For those not relying on CancelPendingRead() not throwing, it might be even better to build this on top PipeReader.AsStream() and PipeWriter.AsStream().

Kestrel cannot in its connection middleware because it expects CancelPendingRead() not to result in OperationCanceledExceptions. Kestrel uses this when it wants to gracefully close connections due to server shutdown or idle connection timeouts. It seems like Orleans has similar requirements, so just keeping the DuplexPipeStream makes sense.

Co-authored-by: Stephen Halter <halter73@gmail.com>

try
{
this.Context.Abort();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, though I wonder why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you don't? Does DisposeAsync() hang? If so, all the time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but it's a race. I see SocketConnection hanging in the ProcessSends loop:
image
image

Abort calls Output.CancelPendingRead(), which allows ProcessSends to complete

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why Output.CancelPendingRead() needs to be called when you already completed the other side of the pipe with await transport.Output.CompleteAsync() above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't make sense to me either yet. I have a memory dump if you're interested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that we're dealing with StreamPipeWriter/StreamPipeReader here, not the regular variants, so CompleteAsync calls through to https://github.com/dotnet/runtime/blob/f9507317fb68fa91851b52dca4a2b4b5edcd94be/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/StreamPipeWriter.cs#L231

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the issue be the LeaveOpen option on StreamPipeReaderExtensions? Kestrel leave it as true, but the looks of things: https://github.com/dotnet/aspnetcore/blob/4504ce5f8e2ecf36fb82c8b5233f44d9dfc01542/src/Servers/Kestrel/Core/src/Middleware/HttpsConnectionMiddleware.cs#L409

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the core Stream.ReadAsync call inside StreamPipeReader: https://github.com/dotnet/runtime/blob/f9507317fb68fa91851b52dca4a2b4b5edcd94be/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/StreamPipeReader.cs#L241

int length = await reader.InnerStream.ReadAsync(buffer, tokenSource.Token).ConfigureAwait(false);

I see that tokenSource is set to InternalTokenSource and InternalTokenSource is canceled by either CancelPendingRead:

        /// <inheritdoc />
        public override void CancelPendingRead()
        {
            InternalTokenSource.Cancel();
        }

or by cancelling the CancellationToken passed to StreamPipeReader.ReadAsync (which we don't pass).

That seems to imply that calling CancelPendingRead on the StreamPipeReader is the correct way to wake up the loop and allow the read to terminate. Is that not how it was intended to be used?

If it is the intended use, is there a better way to call CancelPendingRead other than via ConnectionContext.Abort?
If not, what is the correct pattern. Maybe @davidfowl knows.

_cancelReadCalled = true;
try
{
_input.CancelPendingRead();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is CancelPendingRead() being called from where the PipeReader might already be completed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to avoid a race that you can see?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on where it's being called. In Kestrel, we call CancelPendingRead() in a CancellationToken registration that get's disposed before completing the PipeReader. This guarantees CancelPendingRead() never gets called after the PipeReader is completed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular method never actually gets called - same goes for its counterpart in Kestrel, AFAIK https://github.com/dotnet/aspnetcore/blob/main/src/Servers/Kestrel/Core/src/Middleware/HttpsConnectionMiddleware.cs

@ReubenBond ReubenBond merged commit bbe4b42 into dotnet:3.x Nov 3, 2021
@ReubenBond ReubenBond deleted the v3/fix-connection-shutdown branch November 3, 2021 19:36
@github-actions github-actions bot locked and limited conversation to collaborators Dec 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants