New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix graceful connection shutdown #7345
Conversation
if (_processIncomingTask is not null && _processOutgoingTask is not null) | ||
{ | ||
// Wait for the transport to signal that it's closed before disposing it. | ||
await _transportConnectionClosed.Task; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The completion of this is signaled by a delegate registered to the Context.ConnectionClosed
CancellationToken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work. The ConnectionClosed
token won't fire until everything is flushed unless the connection is aborted.
@@ -116,6 +118,8 @@ protected virtual async Task RunInternal() | |||
_processIncomingTask = this.ProcessIncoming(); | |||
_processOutgoingTask = this.ProcessOutgoing(); | |||
await Task.WhenAll(_processIncomingTask, _processOutgoingTask); | |||
|
|||
Context.Abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait for the processing loops to complete before calling ConnectionContext.Abort()
.
Is there a better way to signal that the underlying transport should begin closing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the processing tasks have completed here, that doesn't mean the transport is done flushing everything. Abort()
could still cause truncation of the response at least with Kestrel's transports. Is Orleans using both Kestrel and its own copy of Kestrel's Socket transport? It might be worth looking into SocketConnectionContextFactory which got added in .NET 6 RC1.
Maybe you should be calling DisposeAsync()
instead, but even that might be unnecessary since Kestrel will do this automatically when connection middleware completes. If this is always running as part of some sort of middleware, the middleware pipeline itself should probably take care of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Orleans using both Kestrel and its own copy of Kestrel's Socket transport?
Yes, it is. Thanks, we might be able to use SocketConnectionContextFactory
once we add net6.0
to our targets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does always run as a part of a middleware pipeline. The middleware was not exiting when TLS was added (eg,
orleans/src/Orleans.Connections.Security/Security/TlsServerConnectionMiddleware.cs
Line 202 in 24299be
await _next(context); |
{ | ||
try | ||
{ | ||
_transport?.Output.CancelPendingFlush(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be abortive? Why wouldn't this wait on any output backpressure? Note that because FlushAsync() will complete before everything is really flushed because of ResumeWriterThreshold. ConnectionContext.DisposeAsync()
shouldn't complete until everything is fully flushed at the application layer. The OS does it's own similar trickery with completing flushes before everything is acked, but that's unavoidable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to use ConnectionContext.DisposeAsync()
to signal that the processing loops should terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for graceful termination if you don't want output truncation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I misunderstood this. For the output loop, I expect that most of the time isn't spent waiting on flushing. If you are waiting on flushing and you need to immediately terminate the output loop even though this could cause truncation, just abort the connection. Otherwise, you should cancel this.outgoingMessages.Reader
however this normally happens.
Calling CancelPendingRead()
on the input is pretty common even in graceful shutdown scenarios. You shouldn't call ConnectionContext.DisposeAsync()
until you're completely done with the Pipes.
@@ -167,7 +167,7 @@ public static async Task LogException(this Task task, ILogger logger, ErrorCode | |||
} | |||
catch (Exception exc) | |||
{ | |||
var ignored = task.Exception; // Observe exception | |||
_ = task.Exception; // Observe exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really necessary? I thought awaiting the task and catching the exception was enough to observe it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, I didn't even look. VS just barked at me to not have an unused variable. You're right, it's not at all necessary to access the .Exception
prop here.
Seems like a good idea. For those not relying on Kestrel cannot in its connection middleware because it expects |
c7a71c0
to
7063230
Compare
Co-authored-by: Stephen Halter <halter73@gmail.com>
|
||
try | ||
{ | ||
this.Context.Abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, though I wonder why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if you don't? Does DisposeAsync() hang? If so, all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why Output.CancelPendingRead()
needs to be called when you already completed the other side of the pipe with await transport.Output.CompleteAsync()
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to me either yet. I have a memory dump if you're interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that we're dealing with StreamPipeWriter
/StreamPipeReader
here, not the regular variants, so CompleteAsync
calls through to https://github.com/dotnet/runtime/blob/f9507317fb68fa91851b52dca4a2b4b5edcd94be/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/StreamPipeWriter.cs#L231
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the issue be the LeaveOpen
option on StreamPipeReaderExtensions
? Kestrel leave it as true, but the looks of things: https://github.com/dotnet/aspnetcore/blob/4504ce5f8e2ecf36fb82c8b5233f44d9dfc01542/src/Servers/Kestrel/Core/src/Middleware/HttpsConnectionMiddleware.cs#L409
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the core Stream.ReadAsync
call inside StreamPipeReader
: https://github.com/dotnet/runtime/blob/f9507317fb68fa91851b52dca4a2b4b5edcd94be/src/libraries/System.IO.Pipelines/src/System/IO/Pipelines/StreamPipeReader.cs#L241
int length = await reader.InnerStream.ReadAsync(buffer, tokenSource.Token).ConfigureAwait(false);
I see that tokenSource
is set to InternalTokenSource
and InternalTokenSource
is canceled by either CancelPendingRead
:
/// <inheritdoc />
public override void CancelPendingRead()
{
InternalTokenSource.Cancel();
}
or by cancelling the CancellationToken
passed to StreamPipeReader.ReadAsync
(which we don't pass).
That seems to imply that calling CancelPendingRead
on the StreamPipeReader
is the correct way to wake up the loop and allow the read to terminate. Is that not how it was intended to be used?
If it is the intended use, is there a better way to call CancelPendingRead
other than via ConnectionContext.Abort
?
If not, what is the correct pattern. Maybe @davidfowl knows.
_cancelReadCalled = true; | ||
try | ||
{ | ||
_input.CancelPendingRead(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is CancelPendingRead() being called from where the PipeReader might already be completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to avoid a race that you can see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on where it's being called. In Kestrel, we call CancelPendingRead() in a CancellationToken registration that get's disposed before completing the PipeReader. This guarantees CancelPendingRead() never gets called after the PipeReader is completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular method never actually gets called - same goes for its counterpart in Kestrel, AFAIK https://github.com/dotnet/aspnetcore/blob/main/src/Servers/Kestrel/Core/src/Middleware/HttpsConnectionMiddleware.cs
bcd9169
to
7b90ceb
Compare
This is a fix for #7337.
Essentially, there was a race during connection teardown in which the underlying buffers were being yanked from the incoming/outgoing processing loops and returned to the memory pool. The culprit is the call to {
PipeReader
,PipeWriter
}.Completed()
The PR also brings
DuplexPipeStream
inline with ASP.NET Core's impl, since it's changed since we first forked it.cc @halter73 @davidfowl - does this look about right to you? I'm explicitly calling
Context.Abort()
after the connection finishes its processing loops, since calling {PipeReader
,PipeWriter
}.Completed()
isn't enough for the TLS case. Is there something else that we should be doing instead?Microsoft Reviewers: Open in CodeFlow