Skip to content
This repository has been archived by the owner on Jul 19, 2024. It is now read-only.

System.Net.Sockets.SocketException - Dropping Messages? #43

Closed
thomasmichaelwallace opened this issue Jan 22, 2020 · 9 comments
Closed

Comments

@thomasmichaelwallace
Copy link

(I'm still finding my way around Azure, so apologise if this is not the approrpiate place to raise this issue).

We have a Node.js Function App configured with an Azure Event Hubs trigger.

Because this is a fairly critical piece of infrastructure to us, we are alerting on the Application Insights query (exceptions where severityLevel > 1) > 0.

Last night we received 14 alerts of: Error processing event from Partition Id: '#', Owner: '#uuid#', EventHubPath: '#consumer-group#'

I'm having trouble deciphering the error messages- but does this mean that the extension dropped 14 messages?

I'm happy to provide any more information, the most pertenant for this project seems to be the parsedStack:

{
  "parsedStack": [
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 0,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 1,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 2,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpEventHubClient+<CreateConnectionAsync>d__32.MoveNext",
      "level": 3,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 4,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 5,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 6,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1+<OnCreateAsync>d__6.MoveNext",
      "level": 7,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 8,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 9,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 10,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 11,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 12,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 13,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 14,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 15,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 16,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<CreateLinkAsync>d__15.MoveNext",
      "level": 17,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 18,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 19,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 20,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.FaultTolerantAmqpObject`1+<OnCreateAsync>d__6.MoveNext",
      "level": 21,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 22,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 23,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 24,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 25,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 26,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.Amqp, Version=2.4.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35",
      "method": "Microsoft.Azure.Amqp.Singleton`1+<GetOrCreateAsync>d__13.MoveNext",
      "level": 27,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 28,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 29,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<OnReceiveAsync>d__13.MoveNext",
      "level": 30,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 31,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<OnReceiveAsync>d__13.MoveNext",
      "level": 32,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 33,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 34,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification",
      "level": 35,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.PartitionReceiver+<ReceiveAsync>d__31.MoveNext",
      "level": 36,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw",
      "level": 37,
      "line": 0
    },
    {
      "assembly": "System.Private.CoreLib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e",
      "method": "System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess",
      "level": 38,
      "line": 0
    },
    {
      "assembly": "Microsoft.Azure.EventHubs, Version=2.1.0.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c",
      "method": "Microsoft.Azure.EventHubs.Amqp.AmqpPartitionReceiver+<ReceivePumpAsync>d__18.MoveNext",
      "level": 39,
      "line": 0
    }
  ],
  "severityLevel": "Error",
  "outerId": "0",
  "message": "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.",
  "type": "System.Net.Sockets.SocketException",
  "id": "61997365"
}
@ghost ghost added the Needs: triage 🔍 label Jan 22, 2020
@alrod
Copy link
Member

alrod commented Jan 28, 2020

The error comes from:
https://github.com/Azure/azure-functions-eventhubs-extension/blob/dev/src/Microsoft.Azure.WebJobs.Extensions.EventHubs/Listeners/EventHubListener.cs#L160

The method called when the underlying client experiences an error while receiving. EventProcessorHost will take care of recovering from the error and continuing to pump messages.
https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.eventhubs.processor.ieventprocessor.processerrorasync?view=azure-dotnet

So nothing to worry about, your messages did not drop as there wasn't checkpoint operation.

@thomasmichaelwallace
Copy link
Author

Thanks @alrod

So if I understand correctly this is a bug. Because an expected exception (System.Net.Sockets.SocketException) is being logged with a high severity?

What's the best place to report/fix this?

@alrod
Copy link
Member

alrod commented Jan 28, 2020

I am not sure if this a valid bug:
The method called when the underlying client experiences an error while receiving. EventProcessorHost will take care of recovering from the error and continuing to pump messages.
https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.eventhubs.processor.ieventprocessor.processerrorasync?view=azure-dotnet

Errors during pumping new messages are expected and EventHub guranties a message a will be delivered at least once.

How often do you get SocketExceptions ? only during a night?

@thomasmichaelwallace
Copy link
Author

I guess the bug for me is the serverity (just like in #5).

Consider:

  • It's expected. So it's not exceptional.
  • If theres no action I can take (the event processor host recovers itself, and I don't think I even have access to it?) - then why do I need to know about it?

I am trying to monitor the event hub -> function app connection so that I can quickly react to any issues. For example, just today a company I'm working with revoked the shared access key, and but watching for errors I could quickly let them know and restore the string.

Having a couple of message with a high serveriy level where the only action we can take is to ignore them does not scalable. Especially when they happen at two in the morning!

@thomasmichaelwallace
Copy link
Author

thomasmichaelwallace commented Jan 28, 2020

(To answer your question - we've had 16 over the last week)

@alrod
Copy link
Member

alrod commented Jan 28, 2020

we can't change severity level for specific exception types. we consider ReceiverDisconnectedException and LeaseLostException as information because these exceptions can happen as part of normal partition balancing across instances.

16 per week is fairly low amount, I only can recommend adjust your alerting logic:

  1. Filter by specific exception types.
  2. Send an alert if you have more then x erros during y period of time.

@thomasmichaelwallace
Copy link
Author

Ok, I guess I'm just missing context here

The issue with only sending an alert if there are more than x a minute is that we risk missing actual exceptions. (For example, we get the odd storage exception that happens due to an error we need to resolve).

And if I filter by the socket exception, then it feels like we're all acknowledging that all socket errors expected in normal operation; and that there's no risk to ignoring them?

I guess I'm just missing the context as to why we can't alter the severity of the exception that we know is not exceptional?

Maybe there is an opportunity to document a list of exceptions, like these, which are thrown as part of the internals of the function app when discussing monitoring event hub integrations?

(As an aside, 16 might not sound like much, but there's only 10 devices pushing data; which is why it doesn't feel very scalable)

@thomasmichaelwallace
Copy link
Author

Or to put it another way. It feels like you're saying it's reasonable to expect anyone who wants to monitor an event hubs integration to build the following query:

exceptions
| where severityLevel > 1 // only alert on errors and warnings
| where innermostMessage != "The lease ID specified did not match the lease ID for the blob." // known bug: https://github.com/Azure/azure-functions-eventhubs-extension/issues/42
| where problemId != "System.Net.Sockets.SocketException at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw" // unexceptional: https://github.com/Azure/azure-functions-eventhubs-extension/issues/43
| where not (problemId startswith "System.ArgumentOutOfRangeException at Microsoft.Azure.EventHubs.Processor.PartitionContext") // unexceptional: https://stackoverflow.com/questions/51602965/azure-eventhub-function-leaselostexception

@thomasmichaelwallace
Copy link
Author

Closing due to inactivity.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants