Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChannelProvider Publish reconnection attempts continue forever after IEndpointInstance had been stopped #1268

Open
mstavrev opened this issue Jun 20, 2023 · 9 comments · May be fixed by #1270
Labels

Comments

@mstavrev
Copy link

mstavrev commented Jun 20, 2023

Describe the bug

Description

Our application manages endpoints on the runtime, e.g. creates and terminates various endpoints depending on application logic. Due to the dynamic nature of this process, it is possible that an endpoint need to be terminated, while the RabbitMQ broker it used to be connected to isn't available, for example had been shutdown.
Note: A precondition is to have the bus endpoint successfully be created and connected to the broker in the first place.

Expected behavior

When IEndpointInstance.Stop() had completed without an error (Task await completed without an exception), the RabbitMQ transport that backed up this bus must had been shutdown as well.

Actual behavior

When endpoint is stopped while it is in an error state (RabbitMQ broker had been shutdown for example), NServiceBus continues to periodically log entries such as:

2023-06-20 14:52:08.0278 [26] INFO NServiceBus.Transport.RabbitMQ.ChannelProvider 'XXX Publish': Reconnecting to the broker failed: RabbitMQ.Client.Exceptions.BrokerUnreachableException: None of the specified endpoints were reachable

 ---> System.AggregateException: One or more errors occurred. (Connection failed)
 ---> RabbitMQ.Client.Exceptions.ConnectFailureException: Connection failed
 ---> System.Net.Sockets.SocketException (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
--- End of stack trace from previous location ---
   at RabbitMQ.Client.Impl.TcpClientAdapter.ConnectAsync(String host, Int32 port)
   at RabbitMQ.Client.Impl.TaskExtensions.TimeoutAfter(Task task, TimeSpan timeout)
   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, TimeSpan timeout)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectOrFail(ITcpClient socket, AmqpTcpEndpoint endpoint, TimeSpan timeout)
   at RabbitMQ.Client.Impl.SocketFrameHandler.ConnectUsingAddressFamily(AmqpTcpEndpoint endpoint, Func`2 socketFactory, TimeSpan timeout, AddressFamily family)
   at RabbitMQ.Client.Impl.SocketFrameHandler..ctor(AmqpTcpEndpoint endpoint, Func`2 socketFactory, TimeSpan connectionTimeout, TimeSpan readTimeout, TimeSpan writeTimeout)
   at RabbitMQ.Client.ConnectionFactory.CreateFrameHandler(AmqpTcpEndpoint endpoint)
   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.EndpointResolverExtensions.SelectOne[T](IEndpointResolver resolver, Func`2 selector)
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
   --- End of inner exception stack trace ---
   at RabbitMQ.Client.ConnectionFactory.CreateConnection(IEndpointResolver endpointResolver, String clientProvidedName)
   at NServiceBus.Transport.RabbitMQ.ConnectionFactory.CreateConnection(String connectionName, Boolean automaticRecoveryEnabled, Int32 consumerDispatchConcurrency) in /_/src/NServiceBus.Transport.RabbitMQ/Connection/ConnectionFactory.cs:line 142
   at NServiceBus.Transport.RabbitMQ.ConnectionFactory.CreatePublishConnection() in /_/src/NServiceBus.Transport.RabbitMQ/Connection/ConnectionFactory.cs:line 114
   at NServiceBus.Transport.RabbitMQ.ChannelProvider.CreateConnection() in /_/src/NServiceBus.Transport.RabbitMQ/Connection/ChannelProvider.cs:line 24
   at NServiceBus.Transport.RabbitMQ.ChannelProvider.ReconnectSwallowingExceptions(String connectionName) in /_/src/NServiceBus.Transport.RabbitMQ/Connection/ChannelProvider.cs:line 51 
2023-06-20 14:52:08.0278 [26] INFO NServiceBus.Transport.RabbitMQ.ChannelProvider 'XXX Publish': Attempting to reconnect in 10 seconds. 
...

This means, the transport had not been completely shutdown and cleaned up. This creates needless load on our application as this may occur for multiple bus entries. Additionally it creates error logging (yes, we know we can suppress it for the time being) that's cluttering our application log.

Versions

8.0.2

Steps to reproduce

  1. Have a running and accessible RabbitMQ server
  2. Create and start an endpoint
  3. Stop the RabbitMQ server
  4. Wait a few seconds until NServiceBus Transport and core detect the problem and begin re-connection attempts
  5. Stop the endpoint, but ensure the process remains active (await Task.Delay(-1))

☝🏼 Observe that the console and log indicate that the transport is still reconnecting.

  1. Start the RabbitMQ server

☝🏼 Observe that the console and log indicate that the transport reconnects and that a connection to the broker remains active.

Relevant log output

No response

Additional Information

Workarounds

I have not found a way to mitigate it with application-level code.

Possible solutions

Additional information

@ramonsmits
Copy link
Member

@mstavrev Am I correct in interpreting that the endpoint Stop does complete but remains active in the background or does the Stop throw an error?

@mstavrev
Copy link
Author

The Stop call completes without an exception.

@ramonsmits
Copy link
Member

@mstavrev I'll try and repro this Today and triage this.

How is this affecting you at the moment besides the excessive logging behavior?

@ramonsmits ramonsmits self-assigned this Jun 22, 2023
@mstavrev
Copy link
Author

It isn't critical.

@ramonsmits
Copy link
Member

@mstavrev I've been able to reproduce this and updated the "Steps to reproduce" with an additional observation.

@mstavrev
Copy link
Author

Thanks for the update.

@mstavrev
Copy link
Author

Any estimation when the fix shall be released?

@TravisNickels
Copy link
Member

The bug is currently in queue to be fixed, but there is no estimated time frame for when it will be released. Continue to subscribe to the issue to receive future updates.

@ramonsmits
Copy link
Member

@mstavrev fix is here if you are severely affected but we didn't reach consensus yet on this fix

@ramonsmits ramonsmits removed their assignment Jun 27, 2024
@ramonsmits ramonsmits added the bug label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants