Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SocketAsyncEngine not signaling ManualResetEvent #53921

Closed
flakey-bit opened this issue Jun 9, 2021 · 8 comments
Closed

SocketAsyncEngine not signaling ManualResetEvent #53921

flakey-bit opened this issue Jun 9, 2021 · 8 comments

Comments

@flakey-bit
Copy link

flakey-bit commented Jun 9, 2021

We're trying to diagnose a recurring production problem in a component that processes messages from Amazon SQS whereby it will stall (fail to process messages on the queue). Examination of a memory dump in the stalled state seems to indicate either a deadlock or missed notification.

  • It doesn't look like threadpool starvation (I'm aware I/O continuations run on the general TP queue)
  • It never recovers by itself (needs to be recycled)
  • It looks like the task that's supposed to be reading from SQS is blocked on reading the content stream from the HTTP request the AWS SDK makes.
  • It looks like we've got the response headers back from that request, but we're blocked reading the response body.
  • It looks like the event loop hasn't signalled the manual reset event

This problem has been going on for a while now across several of our components.

Possibly related:

netcoreapp3.1on Ubuntu 18 (Docker Container)

See dump analysis:

Consumer task (00007f8e081f4720) i.e. the one reading from SQS is tied up by AWS SDK Unmarshaller (comes directly after HTTP handler in the AWS pipeline). 
The unmarshaller task is currently executing. 

> dumpasync -tasks -addr 00007F8E08A2B3D0 -stacks
         Address               MT     Size      State Description
00007f8e08a2b3d0 00007f8e800e73a8      152         -1 Amazon.Runtime.Internal.Unmarshaller+<InvokeAsync>d__3`1[[System.__Canon, System.Private.CoreLib]]
Continuation chains:
.00007f8e08a2b468 (0) Amazon.SQS.Internal.ValidationResponseHandler+<InvokeAsync>d__1`1[[System.__Canon, System.Private.CoreLib]]
..00007f8e08a2b4f0 (0) Amazon.Runtime.Internal.ErrorHandler+<InvokeAsync>d__5`1[[System.__Canon, System.Private.CoreLib]]
...00007f8e08a2b578 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
....00007f8e08a2b600 (0) Amazon.Runtime.Internal.EndpointDiscoveryHandler+<InvokeAsync>d__2`1[[System.__Canon, System.Private.CoreLib]]
.....00007f8e08a2b698 (1) Amazon.Runtime.Internal.CredentialsRetriever+<InvokeAsync>d__7`1[[System.__Canon, System.Private.CoreLib]]
......00007f8e08a2b738 (0) Amazon.Runtime.Internal.RetryHandler+<InvokeAsync>d__10`1[[System.__Canon, System.Private.CoreLib]]
.......00007f8e08a2b7f8 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
........00007f8e08a2b880 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
.........00007f8e08a2b908 (0) Amazon.Runtime.Internal.ErrorCallbackHandler+<InvokeAsync>d__5`1[[System.__Canon, System.Private.CoreLib]]
..........00007f8e08a2b990 (0) Amazon.Runtime.Internal.MetricsHandler+<InvokeAsync>d__1`1[[System.__Canon, System.Private.CoreLib]]
...........00007f8e081f4720 (0) Acme.Organisation.Processor.Queue.QueueService+<Write>d__6
............00007f8e081f4808 (0) Acme.Organisation.Processor.Queue.QueueService+<>c__DisplayClass5_1+<<ExecuteAsync>b__1>d
.............00007f8e081e2960 System.Threading.Tasks.UnwrapPromise`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib]]
..............00007f8e081f2618 System.Threading.Tasks.Task+WhenAllPromise
...............00007f8e081f2668 (0) Acme.Organisation.QueueService+<ExecuteAsync>d__5
--------------------------------------------------------------------------------

See callstack for thread 11. It's using the stream returned by the http handler.

> clrstack
OS Thread Id: 0x820 (11)
        Child SP               IP Call Site
00007F8DD7FFB810 00007f8ef802d00c [GCFrame: 00007f8dd7ffb810]
00007F8DD7FFB8F0 00007f8ef802d00c [HelperMethodFrame_1OBJ: 00007f8dd7ffb8f0] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
00007F8DD7FFBA20 00007F8E811A3A25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/ManualResetEventSlim.cs @ 626]
00007F8DD7FFBAB0 00007F8E8124865F System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1273]
00007F8DD7FFBB20 00007F8E81219D3C System.Net.Sockets.SocketAsyncContext.ReceiveFrom(System.Memory`1<Byte>, System.Net.Sockets.SocketFlags ByRef, Byte[], Int32 ByRef, Int32, Int32 ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1465]
00007F8DD7FFBBC0 00007F8E81219A9D System.Net.Sockets.SocketPal.Receive(System.Net.Sockets.SafeSocketHandle, Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, Int32 ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs @ 983]
00007F8DD7FFBC60 00007F8E81219726 System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/Socket.cs @ 1386]
00007F8DD7FFBCB0 00007F8E81219340 System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/NetworkStream.cs @ 266]
00007F8DD7FFBD00 00007F8E8116E1B4 System.Net.Security.SslStream.FillBufferAsync[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]](SslReadSync, Int32) [/_/src/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 1444]
00007F8DD7FFBDA0 00007F8E8116EE93 System.Net.Security.SslStream+<ReadAsyncInternal>d__214`1[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]].MoveNext() [/_/src/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 1439]
00007F8DD7FFBF10 00007F8E8116E904 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Net.Security.SslStream+<ReadAsyncInternal>d__214`1[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]], System.Net.Security]](<ReadAsyncInternal>d__214`1<SslReadSync> ByRef) [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncMethodBuilder.cs @ 1036]
00007F8DD7FFBF50 00007F8E8116E82C System.Net.Security.SslStream.ReadAsyncInternal[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]](SslReadSync, System.Memory`1<Byte>)
00007F8DD7FFC000 00007F8E8116E592 System.Net.Security.SslStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Security/src/System/Net/Security/SslStream.cs @ 753]
00007F8DD7FFC080 00007F8E8116E394 System.IO.Stream.Read(System.Span`1<Byte>) [/_/src/System.Private.CoreLib/shared/System/IO/Stream.cs @ 756]
00007F8DD7FFC0C0 00007F8E8116E056 System.Net.Http.HttpConnection.Read(System.Span`1<Byte>) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs @ 1481]
00007F8DD7FFC130 00007F8E811350CE System.Net.Http.HttpConnection+ContentLengthReadStream.Read(System.Span`1<Byte>) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ContentLengthReadStream.cs @ 26]
00007F8DD7FFC170 00007F8E81134F1F System.Net.Http.HttpBaseStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpBaseStream.cs @ 91]
00007F8DD7FFC1B0 00007F8E81163F12 System.IO.StreamReader.ReadBuffer(System.Span`1<Char>, Boolean ByRef) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 658]
00007F8DD7FFC210 00007F8E8117F6DE System.IO.StreamReader.ReadSpan(System.Span`1<Char>) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 360]
00007F8DD7FFC270 00007F8E8117F4EC System.IO.StreamReader.Read(Char[], Int32, Int32) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 338]
00007F8DD7FFC2A0 00007F8E8117F400 System.Xml.XmlTextReaderImpl.ReadData() [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 3358]
00007F8DD7FFC2D0 00007F8E811AD151 System.Xml.XmlTextReaderImpl.InitTextReaderInput(System.String, System.Uri, System.IO.TextReader) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 2940]
00007F8DD7FFC300 00007F8E811AD011 System.Xml.XmlTextReaderImpl.FinishInitTextReader() [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 784]
00007F8DD7FFC320 00007F8E811AC5BF System.Xml.XmlTextReaderImpl..ctor(System.IO.TextReader, System.Xml.XmlReaderSettings, System.String, System.Xml.XmlParserContext) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 768]
00007F8DD7FFC360 00007F8E811AC452 System.Xml.XmlReaderSettings.CreateReader(System.IO.TextReader, System.String, System.Xml.XmlParserContext) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlReaderSettings.cs @ 487]
00007F8DD7FFC3A0 00007F8E7FE11302 Amazon.Runtime.Internal.Transform.XmlUnmarshallerContext.Read()
00007F8DD7FFC400 00007F8E7FE1051A Amazon.SQS.Model.Internal.MarshallTransformations.ReceiveMessageResponseUnmarshaller.Unmarshall(Amazon.Runtime.Internal.Transform.XmlUnmarshallerContext)
00007F8DD7FFC440 00007F8E81194CFA Amazon.Runtime.Internal.Transform.XmlResponseUnmarshaller.Unmarshall(Amazon.Runtime.Internal.Transform.UnmarshallerContext)
00007F8DD7FFC470 00007F8E81182889 Amazon.Runtime.Internal.Transform.ResponseUnmarshaller.UnmarshallResponse(Amazon.Runtime.Internal.Transform.UnmarshallerContext)
... lines ommitted

Meanwhile, callstack for "event loop" 

> clrstack
OS Thread Id: 0x21 (10)
        Child SP               IP Call Site
00007F8E07FFC740 00007f8ef7c317ef [InlinedCallFrame: 00007f8e07ffc740] Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC740 00007f8e7f7dc06b [InlinedCallFrame: 00007f8e07ffc740] Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC730 00007F8E7F7DC06B ILStubClass.IL_STUB_PInvoke(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC7C0 00007F8E7FB0D25C System.Net.Sockets.SocketAsyncEngine.EventLoop() [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 311]
00007F8E07FFC810 00007F8E7FB0D219 System.Net.Sockets.SocketAsyncEngine+<>c.<.ctor>b__23_0(System.Object) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 285]
00007F8E07FFC820 00007F8E7D874883 System.Threading.Tasks.Task.InnerInvoke() [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2445]
00007F8E07FFC840 00007F8E7D874691 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2378]
00007F8E07FFC8C0 00007F8E7D8744E8 System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2321]
00007F8E07FFC8E0 00007F8E7D88146B System.Threading.Tasks.ThreadPoolTaskScheduler+<>c.<.cctor>b__10_0(System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/ThreadPoolTaskScheduler.cs @ 36]
00007F8E07FFC8F0 00007F8E7D85544A System.Threading.ThreadHelper.ThreadStart(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 81]
00007F8E07FFCC60 00007f8ef7142d5f [GCFrame: 00007f8e07ffcc60]
00007F8E07FFCD30 00007f8ef7142d5f [DebuggerU2MCatchHandlerFrame: 00007f8e07ffcd30]

ManualResetEventSlim shared:

> gcroot 00007f8e08a30058
Thread 21:
    00007F8E07FFC7C0 00007F8E7FB0D25C System.Net.Sockets.SocketAsyncEngine.EventLoop() [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 311]
        rbx:
            ->  00007F8E081E02E8 System.Net.Sockets.SocketAsyncEngine
            ->  00007F8E081E0328 System.Collections.Concurrent.ConcurrentDictionary`2[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E08382040 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E08381728 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]][]
            ->  00007F8E089F0D20 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E089F0C48 System.Net.Sockets.SocketAsyncContext
            ->  00007F8E08A2FFE8 System.Net.Sockets.SocketAsyncContext+BufferMemoryReceiveOperation
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim

Thread 820:
    00007F8DD7FFBA20 00007F8E811A3A25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/ManualResetEventSlim.cs @ 626]
        rbp-70: 00007f8dd7ffba30
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim

    00007F8DD7FFBAB0 00007F8E8124865F System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1273]
        rbp-50: 00007f8dd7ffbac0
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim

EDIT: Dumped BufferMemoryReceiveOperation showing 0 bytes transferred so far

> dumpheap -type BufferMemoryReceiveOperation
         Address               MT     Size

00007f8e08a2ffe8 00007f8e7fcc2a48      112      => context: 00007f8e089f0c48 (SocketAsyncContext)

> do 00007f8e08a2ffe8
Name:        System.Net.Sockets.SocketAsyncContext+BufferMemoryReceiveOperation
MethodTable: 00007f8e7fcc2a48
EEClass:     00007f8e7fcb59e8
Size:        112(0x70) bytes
File:        /usr/share/dotnet/shared/Microsoft.NETCore.App/3.1.1/System.Net.Sockets.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
00007f8e7ddba0e8  4000298       28         System.Int32  1 instance                0 _state
00007f8e7fb8a6c8  4000299        8 ...ocketAsyncContext  0 instance 00007f8e089f0c48 AssociatedContext
00007f8e7fb8a000  400029a       10 ...xt+AsyncOperation  0 instance 00007f8e08a2ffe8 Next
00007f8e7d33c798  400029b       18        System.Object  0 instance 00007f8e08a30058 CallbackOrEvent
00007f8e7faacf48  400029c       2c         System.Int32  1 instance                0 ErrorCode
00007f8e7ddc14c0  400029d       20        System.Byte[]  0 instance 0000000000000000 SocketAddress
00007f8e7ddba0e8  400029e       30         System.Int32  1 instance                0 SocketAddressLen
00007f8e7e41b5c8  400029f       38 ...TokenRegistration  1 instance 00007f8e08a30020 CancellationRegistration
00007f8e7faad098  40002af       48         System.Int32  1 instance                0 Flags
00007f8e7faad098  40002b0       4c         System.Int32  1 instance                0 ReceivedFlags
00007f8e7ddba0e8  40002b1       50         System.Int32  1 instance                0 BytesTransferred
00007f8e7e66acb8  40002b2       58 ...Private.CoreLib]]  1 instance 00007f8e08a30040 Buffer      
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net untriaged New issue has not been triaged by the area owner labels Jun 9, 2021
@ghost
Copy link

ghost commented Jun 9, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

We're trying to diagnose a recurring production problem in a component that processes messages from Amazon SQS whereby it will stall (fail to process messages on the queue). Examination of a memory dump in the stalled state seems to indicate either a deadlock or missed notification.

  • It doesn't look like threadpool starvation (I'm aware I/O continuations run on the general TP queue)
  • It never recovers by itself (needs to be recycled)
  • It looks like the task that's supposed to be reading from SQS is blocked on reading the content stream from the HTTP request the AWS SDK makes.
  • It looks like we've got the response headers back from that request, but we're blocked reading the response body.
  • It looks like the event loop hasn't signalled the manual reset event

This problem has been going on for a while now across several of our components.

Possibly related:

netcoreapp3.1on Ubuntu 18 (Docker Container)

See dump analysis:

Consumer task (00007f8e081f4720) i.e. the one reading from SQS is tied up by AWS SDK Unmarshaller (comes directly after HTTP handler in the AWS pipeline). 
The unmarshaller task is currently executing. 

> dumpasync -tasks -addr 00007F8E08A2B3D0 -stacks
         Address               MT     Size      State Description
00007f8e08a2b3d0 00007f8e800e73a8      152         -1 Amazon.Runtime.Internal.Unmarshaller+<InvokeAsync>d__3`1[[System.__Canon, System.Private.CoreLib]]
Continuation chains:
.00007f8e08a2b468 (0) Amazon.SQS.Internal.ValidationResponseHandler+<InvokeAsync>d__1`1[[System.__Canon, System.Private.CoreLib]]
..00007f8e08a2b4f0 (0) Amazon.Runtime.Internal.ErrorHandler+<InvokeAsync>d__5`1[[System.__Canon, System.Private.CoreLib]]
...00007f8e08a2b578 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
....00007f8e08a2b600 (0) Amazon.Runtime.Internal.EndpointDiscoveryHandler+<InvokeAsync>d__2`1[[System.__Canon, System.Private.CoreLib]]
.....00007f8e08a2b698 (1) Amazon.Runtime.Internal.CredentialsRetriever+<InvokeAsync>d__7`1[[System.__Canon, System.Private.CoreLib]]
......00007f8e08a2b738 (0) Amazon.Runtime.Internal.RetryHandler+<InvokeAsync>d__10`1[[System.__Canon, System.Private.CoreLib]]
.......00007f8e08a2b7f8 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
........00007f8e08a2b880 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
.........00007f8e08a2b908 (0) Amazon.Runtime.Internal.ErrorCallbackHandler+<InvokeAsync>d__5`1[[System.__Canon, System.Private.CoreLib]]
..........00007f8e08a2b990 (0) Amazon.Runtime.Internal.MetricsHandler+<InvokeAsync>d__1`1[[System.__Canon, System.Private.CoreLib]]
...........00007f8e081f4720 (0) Acme.Organisation.Processor.Queue.QueueService+<Write>d__6
............00007f8e081f4808 (0) Acme.Organisation.Processor.Queue.QueueService+<>c__DisplayClass5_1+<<ExecuteAsync>b__1>d
.............00007f8e081e2960 System.Threading.Tasks.UnwrapPromise`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib]]
..............00007f8e081f2618 System.Threading.Tasks.Task+WhenAllPromise
...............00007f8e081f2668 (0) Acme.Organisation.QueueService+<ExecuteAsync>d__5
--------------------------------------------------------------------------------

See callstack for thread 11. It's using the stream returned by the http handler.

> clrstack
OS Thread Id: 0x820 (11)
        Child SP               IP Call Site
00007F8DD7FFB810 00007f8ef802d00c [GCFrame: 00007f8dd7ffb810]
00007F8DD7FFB8F0 00007f8ef802d00c [HelperMethodFrame_1OBJ: 00007f8dd7ffb8f0] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
00007F8DD7FFBA20 00007F8E811A3A25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/ManualResetEventSlim.cs @ 626]
00007F8DD7FFBAB0 00007F8E8124865F System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1273]
00007F8DD7FFBB20 00007F8E81219D3C System.Net.Sockets.SocketAsyncContext.ReceiveFrom(System.Memory`1<Byte>, System.Net.Sockets.SocketFlags ByRef, Byte[], Int32 ByRef, Int32, Int32 ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1465]
00007F8DD7FFBBC0 00007F8E81219A9D System.Net.Sockets.SocketPal.Receive(System.Net.Sockets.SafeSocketHandle, Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, Int32 ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs @ 983]
00007F8DD7FFBC60 00007F8E81219726 System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/Socket.cs @ 1386]
00007F8DD7FFBCB0 00007F8E81219340 System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/NetworkStream.cs @ 266]
00007F8DD7FFBD00 00007F8E8116E1B4 System.Net.Security.SslStream.FillBufferAsync[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]](SslReadSync, Int32) [/_/src/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 1444]
00007F8DD7FFBDA0 00007F8E8116EE93 System.Net.Security.SslStream+<ReadAsyncInternal>d__214`1[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]].MoveNext() [/_/src/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 1439]
00007F8DD7FFBF10 00007F8E8116E904 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Net.Security.SslStream+<ReadAsyncInternal>d__214`1[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]], System.Net.Security]](<ReadAsyncInternal>d__214`1<SslReadSync> ByRef) [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncMethodBuilder.cs @ 1036]
00007F8DD7FFBF50 00007F8E8116E82C System.Net.Security.SslStream.ReadAsyncInternal[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]](SslReadSync, System.Memory`1<Byte>)
00007F8DD7FFC000 00007F8E8116E592 System.Net.Security.SslStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Security/src/System/Net/Security/SslStream.cs @ 753]
00007F8DD7FFC080 00007F8E8116E394 System.IO.Stream.Read(System.Span`1<Byte>) [/_/src/System.Private.CoreLib/shared/System/IO/Stream.cs @ 756]
00007F8DD7FFC0C0 00007F8E8116E056 System.Net.Http.HttpConnection.Read(System.Span`1<Byte>) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs @ 1481]
00007F8DD7FFC130 00007F8E811350CE System.Net.Http.HttpConnection+ContentLengthReadStream.Read(System.Span`1<Byte>) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ContentLengthReadStream.cs @ 26]
00007F8DD7FFC170 00007F8E81134F1F System.Net.Http.HttpBaseStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpBaseStream.cs @ 91]
00007F8DD7FFC1B0 00007F8E81163F12 System.IO.StreamReader.ReadBuffer(System.Span`1<Char>, Boolean ByRef) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 658]
00007F8DD7FFC210 00007F8E8117F6DE System.IO.StreamReader.ReadSpan(System.Span`1<Char>) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 360]
00007F8DD7FFC270 00007F8E8117F4EC System.IO.StreamReader.Read(Char[], Int32, Int32) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 338]
00007F8DD7FFC2A0 00007F8E8117F400 System.Xml.XmlTextReaderImpl.ReadData() [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 3358]
00007F8DD7FFC2D0 00007F8E811AD151 System.Xml.XmlTextReaderImpl.InitTextReaderInput(System.String, System.Uri, System.IO.TextReader) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 2940]
00007F8DD7FFC300 00007F8E811AD011 System.Xml.XmlTextReaderImpl.FinishInitTextReader() [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 784]
00007F8DD7FFC320 00007F8E811AC5BF System.Xml.XmlTextReaderImpl..ctor(System.IO.TextReader, System.Xml.XmlReaderSettings, System.String, System.Xml.XmlParserContext) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 768]
00007F8DD7FFC360 00007F8E811AC452 System.Xml.XmlReaderSettings.CreateReader(System.IO.TextReader, System.String, System.Xml.XmlParserContext) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlReaderSettings.cs @ 487]
00007F8DD7FFC3A0 00007F8E7FE11302 Amazon.Runtime.Internal.Transform.XmlUnmarshallerContext.Read()
00007F8DD7FFC400 00007F8E7FE1051A Amazon.SQS.Model.Internal.MarshallTransformations.ReceiveMessageResponseUnmarshaller.Unmarshall(Amazon.Runtime.Internal.Transform.XmlUnmarshallerContext)
00007F8DD7FFC440 00007F8E81194CFA Amazon.Runtime.Internal.Transform.XmlResponseUnmarshaller.Unmarshall(Amazon.Runtime.Internal.Transform.UnmarshallerContext)
00007F8DD7FFC470 00007F8E81182889 Amazon.Runtime.Internal.Transform.ResponseUnmarshaller.UnmarshallResponse(Amazon.Runtime.Internal.Transform.UnmarshallerContext)
... lines ommitted

Meanwhile, callstack for "event loop" 

> clrstack
OS Thread Id: 0x21 (10)
        Child SP               IP Call Site
00007F8E07FFC740 00007f8ef7c317ef [InlinedCallFrame: 00007f8e07ffc740] Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC740 00007f8e7f7dc06b [InlinedCallFrame: 00007f8e07ffc740] Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC730 00007F8E7F7DC06B ILStubClass.IL_STUB_PInvoke(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC7C0 00007F8E7FB0D25C System.Net.Sockets.SocketAsyncEngine.EventLoop() [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 311]
00007F8E07FFC810 00007F8E7FB0D219 System.Net.Sockets.SocketAsyncEngine+<>c.<.ctor>b__23_0(System.Object) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 285]
00007F8E07FFC820 00007F8E7D874883 System.Threading.Tasks.Task.InnerInvoke() [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2445]
00007F8E07FFC840 00007F8E7D874691 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2378]
00007F8E07FFC8C0 00007F8E7D8744E8 System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2321]
00007F8E07FFC8E0 00007F8E7D88146B System.Threading.Tasks.ThreadPoolTaskScheduler+<>c.<.cctor>b__10_0(System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/ThreadPoolTaskScheduler.cs @ 36]
00007F8E07FFC8F0 00007F8E7D85544A System.Threading.ThreadHelper.ThreadStart(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 81]
00007F8E07FFCC60 00007f8ef7142d5f [GCFrame: 00007f8e07ffcc60]
00007F8E07FFCD30 00007f8ef7142d5f [DebuggerU2MCatchHandlerFrame: 00007f8e07ffcd30]

ManualResetEventSlim shared:

> gcroot 00007f8e08a30058
Thread 21:
    00007F8E07FFC7C0 00007F8E7FB0D25C System.Net.Sockets.SocketAsyncEngine.EventLoop() [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 311]
        rbx:
            ->  00007F8E081E02E8 System.Net.Sockets.SocketAsyncEngine
            ->  00007F8E081E0328 System.Collections.Concurrent.ConcurrentDictionary`2[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E08382040 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E08381728 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]][]
            ->  00007F8E089F0D20 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E089F0C48 System.Net.Sockets.SocketAsyncContext
            ->  00007F8E08A2FFE8 System.Net.Sockets.SocketAsyncContext+BufferMemoryReceiveOperation
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim

Thread 820:
    00007F8DD7FFBA20 00007F8E811A3A25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/ManualResetEventSlim.cs @ 626]
        rbp-70: 00007f8dd7ffba30
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim

    00007F8DD7FFBAB0 00007F8E8124865F System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1273]
        rbp-50: 00007f8dd7ffbac0
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim
Author: flakey-bit
Assignees: -
Labels:

area-System.Net, untriaged

Milestone: -

@flakey-bit flakey-bit changed the title Deadlock from SocketAsyncEngine not signaling ManualResetEvent SocketAsyncEngine not signaling ManualResetEvent Jun 9, 2021
@antonfirsov
Copy link
Member

@flakey-bit this is very likely a duplicate of #46505 and #31570 which is reported to be gone in .NET 5.0. Any chance you can upgrade?

@ghost
Copy link

ghost commented Jun 9, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

We're trying to diagnose a recurring production problem in a component that processes messages from Amazon SQS whereby it will stall (fail to process messages on the queue). Examination of a memory dump in the stalled state seems to indicate either a deadlock or missed notification.

  • It doesn't look like threadpool starvation (I'm aware I/O continuations run on the general TP queue)
  • It never recovers by itself (needs to be recycled)
  • It looks like the task that's supposed to be reading from SQS is blocked on reading the content stream from the HTTP request the AWS SDK makes.
  • It looks like we've got the response headers back from that request, but we're blocked reading the response body.
  • It looks like the event loop hasn't signalled the manual reset event

This problem has been going on for a while now across several of our components.

Possibly related:

netcoreapp3.1on Ubuntu 18 (Docker Container)

See dump analysis:

Consumer task (00007f8e081f4720) i.e. the one reading from SQS is tied up by AWS SDK Unmarshaller (comes directly after HTTP handler in the AWS pipeline). 
The unmarshaller task is currently executing. 

> dumpasync -tasks -addr 00007F8E08A2B3D0 -stacks
         Address               MT     Size      State Description
00007f8e08a2b3d0 00007f8e800e73a8      152         -1 Amazon.Runtime.Internal.Unmarshaller+<InvokeAsync>d__3`1[[System.__Canon, System.Private.CoreLib]]
Continuation chains:
.00007f8e08a2b468 (0) Amazon.SQS.Internal.ValidationResponseHandler+<InvokeAsync>d__1`1[[System.__Canon, System.Private.CoreLib]]
..00007f8e08a2b4f0 (0) Amazon.Runtime.Internal.ErrorHandler+<InvokeAsync>d__5`1[[System.__Canon, System.Private.CoreLib]]
...00007f8e08a2b578 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
....00007f8e08a2b600 (0) Amazon.Runtime.Internal.EndpointDiscoveryHandler+<InvokeAsync>d__2`1[[System.__Canon, System.Private.CoreLib]]
.....00007f8e08a2b698 (1) Amazon.Runtime.Internal.CredentialsRetriever+<InvokeAsync>d__7`1[[System.__Canon, System.Private.CoreLib]]
......00007f8e08a2b738 (0) Amazon.Runtime.Internal.RetryHandler+<InvokeAsync>d__10`1[[System.__Canon, System.Private.CoreLib]]
.......00007f8e08a2b7f8 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
........00007f8e08a2b880 (0) Amazon.Runtime.Internal.CallbackHandler+<InvokeAsync>d__9`1[[System.__Canon, System.Private.CoreLib]]
.........00007f8e08a2b908 (0) Amazon.Runtime.Internal.ErrorCallbackHandler+<InvokeAsync>d__5`1[[System.__Canon, System.Private.CoreLib]]
..........00007f8e08a2b990 (0) Amazon.Runtime.Internal.MetricsHandler+<InvokeAsync>d__1`1[[System.__Canon, System.Private.CoreLib]]
...........00007f8e081f4720 (0) Acme.Organisation.Processor.Queue.QueueService+<Write>d__6
............00007f8e081f4808 (0) Acme.Organisation.Processor.Queue.QueueService+<>c__DisplayClass5_1+<<ExecuteAsync>b__1>d
.............00007f8e081e2960 System.Threading.Tasks.UnwrapPromise`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib]]
..............00007f8e081f2618 System.Threading.Tasks.Task+WhenAllPromise
...............00007f8e081f2668 (0) Acme.Organisation.QueueService+<ExecuteAsync>d__5
--------------------------------------------------------------------------------

See callstack for thread 11. It's using the stream returned by the http handler.

> clrstack
OS Thread Id: 0x820 (11)
        Child SP               IP Call Site
00007F8DD7FFB810 00007f8ef802d00c [GCFrame: 00007f8dd7ffb810]
00007F8DD7FFB8F0 00007f8ef802d00c [HelperMethodFrame_1OBJ: 00007f8dd7ffb8f0] System.Threading.Monitor.ObjWait(Boolean, Int32, System.Object)
00007F8DD7FFBA20 00007F8E811A3A25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/ManualResetEventSlim.cs @ 626]
00007F8DD7FFBAB0 00007F8E8124865F System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1273]
00007F8DD7FFBB20 00007F8E81219D3C System.Net.Sockets.SocketAsyncContext.ReceiveFrom(System.Memory`1<Byte>, System.Net.Sockets.SocketFlags ByRef, Byte[], Int32 ByRef, Int32, Int32 ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1465]
00007F8DD7FFBBC0 00007F8E81219A9D System.Net.Sockets.SocketPal.Receive(System.Net.Sockets.SafeSocketHandle, Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, Int32 ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs @ 983]
00007F8DD7FFBC60 00007F8E81219726 System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef) [/_/src/System.Net.Sockets/src/System/Net/Sockets/Socket.cs @ 1386]
00007F8DD7FFBCB0 00007F8E81219340 System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/NetworkStream.cs @ 266]
00007F8DD7FFBD00 00007F8E8116E1B4 System.Net.Security.SslStream.FillBufferAsync[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]](SslReadSync, Int32) [/_/src/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 1444]
00007F8DD7FFBDA0 00007F8E8116EE93 System.Net.Security.SslStream+<ReadAsyncInternal>d__214`1[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]].MoveNext() [/_/src/System.Net.Security/src/System/Net/Security/SslStream.Implementation.cs @ 1439]
00007F8DD7FFBF10 00007F8E8116E904 System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[System.Net.Security.SslStream+<ReadAsyncInternal>d__214`1[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]], System.Net.Security]](<ReadAsyncInternal>d__214`1<SslReadSync> ByRef) [/_/src/System.Private.CoreLib/shared/System/Runtime/CompilerServices/AsyncMethodBuilder.cs @ 1036]
00007F8DD7FFBF50 00007F8E8116E82C System.Net.Security.SslStream.ReadAsyncInternal[[System.Net.Security.SslStream+SslReadSync, System.Net.Security]](SslReadSync, System.Memory`1<Byte>)
00007F8DD7FFC000 00007F8E8116E592 System.Net.Security.SslStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Security/src/System/Net/Security/SslStream.cs @ 753]
00007F8DD7FFC080 00007F8E8116E394 System.IO.Stream.Read(System.Span`1<Byte>) [/_/src/System.Private.CoreLib/shared/System/IO/Stream.cs @ 756]
00007F8DD7FFC0C0 00007F8E8116E056 System.Net.Http.HttpConnection.Read(System.Span`1<Byte>) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnection.cs @ 1481]
00007F8DD7FFC130 00007F8E811350CE System.Net.Http.HttpConnection+ContentLengthReadStream.Read(System.Span`1<Byte>) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ContentLengthReadStream.cs @ 26]
00007F8DD7FFC170 00007F8E81134F1F System.Net.Http.HttpBaseStream.Read(Byte[], Int32, Int32) [/_/src/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpBaseStream.cs @ 91]
00007F8DD7FFC1B0 00007F8E81163F12 System.IO.StreamReader.ReadBuffer(System.Span`1<Char>, Boolean ByRef) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 658]
00007F8DD7FFC210 00007F8E8117F6DE System.IO.StreamReader.ReadSpan(System.Span`1<Char>) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 360]
00007F8DD7FFC270 00007F8E8117F4EC System.IO.StreamReader.Read(Char[], Int32, Int32) [/_/src/System.Private.CoreLib/shared/System/IO/StreamReader.cs @ 338]
00007F8DD7FFC2A0 00007F8E8117F400 System.Xml.XmlTextReaderImpl.ReadData() [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 3358]
00007F8DD7FFC2D0 00007F8E811AD151 System.Xml.XmlTextReaderImpl.InitTextReaderInput(System.String, System.Uri, System.IO.TextReader) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 2940]
00007F8DD7FFC300 00007F8E811AD011 System.Xml.XmlTextReaderImpl.FinishInitTextReader() [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 784]
00007F8DD7FFC320 00007F8E811AC5BF System.Xml.XmlTextReaderImpl..ctor(System.IO.TextReader, System.Xml.XmlReaderSettings, System.String, System.Xml.XmlParserContext) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlTextReaderImpl.cs @ 768]
00007F8DD7FFC360 00007F8E811AC452 System.Xml.XmlReaderSettings.CreateReader(System.IO.TextReader, System.String, System.Xml.XmlParserContext) [/_/src/System.Private.Xml/src/System/Xml/Core/XmlReaderSettings.cs @ 487]
00007F8DD7FFC3A0 00007F8E7FE11302 Amazon.Runtime.Internal.Transform.XmlUnmarshallerContext.Read()
00007F8DD7FFC400 00007F8E7FE1051A Amazon.SQS.Model.Internal.MarshallTransformations.ReceiveMessageResponseUnmarshaller.Unmarshall(Amazon.Runtime.Internal.Transform.XmlUnmarshallerContext)
00007F8DD7FFC440 00007F8E81194CFA Amazon.Runtime.Internal.Transform.XmlResponseUnmarshaller.Unmarshall(Amazon.Runtime.Internal.Transform.UnmarshallerContext)
00007F8DD7FFC470 00007F8E81182889 Amazon.Runtime.Internal.Transform.ResponseUnmarshaller.UnmarshallResponse(Amazon.Runtime.Internal.Transform.UnmarshallerContext)
... lines ommitted

Meanwhile, callstack for "event loop" 

> clrstack
OS Thread Id: 0x21 (10)
        Child SP               IP Call Site
00007F8E07FFC740 00007f8ef7c317ef [InlinedCallFrame: 00007f8e07ffc740] Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC740 00007f8e7f7dc06b [InlinedCallFrame: 00007f8e07ffc740] Interop+Sys.WaitForSocketEvents(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC730 00007F8E7F7DC06B ILStubClass.IL_STUB_PInvoke(IntPtr, SocketEvent*, Int32*)
00007F8E07FFC7C0 00007F8E7FB0D25C System.Net.Sockets.SocketAsyncEngine.EventLoop() [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 311]
00007F8E07FFC810 00007F8E7FB0D219 System.Net.Sockets.SocketAsyncEngine+<>c.<.ctor>b__23_0(System.Object) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 285]
00007F8E07FFC820 00007F8E7D874883 System.Threading.Tasks.Task.InnerInvoke() [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2445]
00007F8E07FFC840 00007F8E7D874691 System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2378]
00007F8E07FFC8C0 00007F8E7D8744E8 System.Threading.Tasks.Task.ExecuteEntryUnsafe(System.Threading.Thread) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs @ 2321]
00007F8E07FFC8E0 00007F8E7D88146B System.Threading.Tasks.ThreadPoolTaskScheduler+<>c.<.cctor>b__10_0(System.Object) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/ThreadPoolTaskScheduler.cs @ 36]
00007F8E07FFC8F0 00007F8E7D85544A System.Threading.ThreadHelper.ThreadStart(System.Object) [/_/src/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 81]
00007F8E07FFCC60 00007f8ef7142d5f [GCFrame: 00007f8e07ffcc60]
00007F8E07FFCD30 00007f8ef7142d5f [DebuggerU2MCatchHandlerFrame: 00007f8e07ffcd30]

ManualResetEventSlim shared:

> gcroot 00007f8e08a30058
Thread 21:
    00007F8E07FFC7C0 00007F8E7FB0D25C System.Net.Sockets.SocketAsyncEngine.EventLoop() [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncEngine.Unix.cs @ 311]
        rbx:
            ->  00007F8E081E02E8 System.Net.Sockets.SocketAsyncEngine
            ->  00007F8E081E0328 System.Collections.Concurrent.ConcurrentDictionary`2[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E08382040 System.Collections.Concurrent.ConcurrentDictionary`2+Tables[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E08381728 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]][]
            ->  00007F8E089F0D20 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.IntPtr, System.Private.CoreLib],[System.Net.Sockets.SocketAsyncContext, System.Net.Sockets]]
            ->  00007F8E089F0C48 System.Net.Sockets.SocketAsyncContext
            ->  00007F8E08A2FFE8 System.Net.Sockets.SocketAsyncContext+BufferMemoryReceiveOperation
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim

Thread 820:
    00007F8DD7FFBA20 00007F8E811A3A25 System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/ManualResetEventSlim.cs @ 626]
        rbp-70: 00007f8dd7ffba30
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim

    00007F8DD7FFBAB0 00007F8E8124865F System.Net.Sockets.SocketAsyncContext.PerformSyncOperation[[System.__Canon, System.Private.CoreLib]](OperationQueue`1<System.__Canon> ByRef, System.__Canon, Int32, Int32) [/_/src/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs @ 1273]
        rbp-50: 00007f8dd7ffbac0
            ->  00007F8E08A30058 System.Threading.ManualResetEventSlim
Author: flakey-bit
Assignees: -
Labels:

area-System.Net.Sockets, untriaged

Milestone: -

@flakey-bit
Copy link
Author

@antonfirsov I'd seen those two issues already, but thanks for highlighting.

  • Those issues both seem to involve stress/DOS attacks, which I don't think applies to us (not public facing)
  • I believe we've seen a similar issue under a different component running on dotnet 5 (however don't have a supporting dump)

3.1 is under LTS right?

@antonfirsov
Copy link
Member

After a second look I realized this is indeed a different issue (#46505 involves cancellation), my apologies. From what I see here this might be an issue calling a synchronous Receive on an already asynchronous (non-blocking) socket.

Any chance you can isolate a repro? It is really hard (if not impossible) to make this actionable without that.

@stephentoub
Copy link
Member

Any chance you have a network trace to go along with this? The stacks suggest that the response had a Content-Length header and the code is waiting for as much data to arrive as was promised. It could, for example, be an issue of the server simply not sending as much as was promised, or something along those lines.

@flakey-bit
Copy link
Author

Thanks for taking a look.

@antonfirsov, we don't have an isolated repro but will try and work on getting one.

It will be difficult, because we don't really have any idea what triggers it. It could, as Stephen says simply be the server didn't send us data. It could also be environmental or temporary loss of network connectivity.

@stephentoub, we don't have a network trace unfortunately. Things can run fine for weeks before stalling occurs. I think we'll look at capturing network traces on an ongoing basis / circular so that we have them available for next time the issue occurs.

I've dumped the corresponding BufferMemoryReceiveOperation, which shows 0 bytes transferred so far.

Other (potentially) relevant information:
We're using the AWS SQS SDK "Long Polling" feature (it's application layer but thought I'd mention it anyway).

@karelz karelz added api-needs-work API needs work before it is approved, it is NOT ready for implementation needs more info and removed untriaged New issue has not been triaged by the area owner labels Jun 15, 2021
@karelz karelz added this to the Future milestone Jun 15, 2021
@karelz karelz removed the api-needs-work API needs work before it is approved, it is NOT ready for implementation label Jun 15, 2021
@karelz
Copy link
Member

karelz commented Aug 20, 2021

Not actionable at the moment. Closing. Feel free to reopen if/when more info is available.

@karelz karelz closed this as completed Aug 20, 2021
@karelz karelz modified the milestones: Future, 7.0.0 Aug 20, 2021
@dotnet dotnet locked as resolved and limited conversation to collaborators Sep 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants