New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MessageSender doesn't recover from lost connection, violates LockDuration #639

Open
SeanFeldman opened this Issue Jan 25, 2019 · 7 comments

Comments

Projects
None yet
6 participants
@SeanFeldman
Copy link
Collaborator

SeanFeldman commented Jan 25, 2019

Actual Behavior

When sending messages (transaction + send-via from message handler as a response to an incoming message) everything works great until there's a connection loss. Once connection is lost, it's re-established. Message receiver for the message handler recovers and picks up messages. But message sender does not. Instead, the incoming message is received and whenever a send operation is attempted, it immediately fails. Exception handler logs ServiceBusException with the message The operation was canceled.

Full stacktrace:

Stacktrace
at Microsoft.Azure.ServiceBus.Core.MessageSender.<OnSendAsync>d__52.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.RetryPolicy.<RunOperation>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Microsoft.Azure.ServiceBus.RetryPolicy.<RunOperation>d__19.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.Core.MessageSender.<SendAsync>d__39.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
   at UserQuery.<>c__DisplayClass0_0.<<Main>b__0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.Azure.ServiceBus.MessageReceivePump.<MessageDispatchTask>d__13.MoveNext()

Additional observation - when the exception is taking place, despite receiving mode PeekLock, message is received immediately after exception w/o waiting for lock duration to expire, increasing DeliveryCount until maximum is reached and it's dead-lettered.

Expected Behaviour

  1. Message sender should recover and not fail upon sending.
  2. LockDuration should be respected.

Steps to reproduce

  1. Setup two queues source and destination (src and dst).
  2. Register src with Message Handler with AutoComplete=false and MaxConcurrentCalls=1.
  3. Configure a MessageSender to send to dst using Send-Via.
  4. In the Message handler callback
    1. Within a transaction scope, send a message using MessageSender mentioned earlier.
    2. Complete the incoming message.
    3. Complete the transaction scope.
  5. Terminate TCP connection using some tool
  6. Immediately send a message to src using a different tool/app before test application connection is restored.
  7. See message received and send operation fail with the aforementioned exception until number of receives reaches MaxDeliveryCount defined on src.
    • Notice LockDuration of the incoming message is entirely ignored as well.
Repro code
async Task Main()
{
	var connectionString = Environment.GetEnvironmentVariable("AzureServiceBus_ConnectionString");
	var src = "src";
	var dest = "dst";
	
	await Initiate(connectionString, src, dest);

	var connection = new ServiceBusConnection(connectionString);
	var queueClient = new QueueClient(connection, src, ReceiveMode.PeekLock, RetryPolicy.Default);
	var sender = new MessageSender(connection, dest, src);
	var counter = 1;

	queueClient.RegisterMessageHandler(async (message, token) =>
	{
		$"received {message.MessageId}".Dump();

		using (var scope = new TransactionScope(TransactionScopeOption.RequiresNew, TransactionScopeAsyncFlowOption.Enabled))
		{
			await sender.SendAsync(new Message());
			$"sent".Dump();
		
			await queueClient.CompleteAsync(message.SystemProperties.LockToken);
			
			scope.Complete();
		}
	},
	new MessageHandlerOptions(exceptionArgs =>
	{
		exceptionArgs.Dump();
		//$"failuer happened {counter++} times. Exception: {exceptionArgs.Exception.Message}".Dump();
		return Task.CompletedTask;
	})
	{
		AutoComplete = false,
		MaxConcurrentCalls = 1
	});

	Console.WriteLine("Press enter to exit");
	Util.ReadLine();
}

async Task Initiate(string connectionString, string src, string dest)
{
	var client = new ManagementClient(connectionString);

	await Task.WhenAll(execute(src), execute(dest));

	await client.CloseAsync();

	async Task execute(string queueName)
	{
		if (await client.QueueExistsAsync(queueName))
		{
			await client.DeleteQueueAsync(queueName);
		}
		await client.CreateQueueAsync(new QueueDescription(queueName)
		{
			MaxDeliveryCount = 100,
			LockDuration = TimeSpan.FromMinutes(5)
		});
	}
}

Repro code (LinqPad file)

Also, this is only happening with using transaction scope. W/o transaction scope MessageSender recovers and no failure is happening.

Versions

  • OS platform and version: Windows 10 64bit
  • .NET Version: .NET 4.7.1
  • NuGet package version or commit ID: 3.3.0

@SeanFeldman SeanFeldman added the bug label Jan 25, 2019

@axisc

This comment has been minimized.

Copy link
Collaborator

axisc commented Jan 27, 2019

@SeanFeldman thanks for reporting this. Agree this is a bug. We'll investigate this internally.

@basvanmeer

This comment has been minimized.

Copy link

basvanmeer commented Feb 5, 2019

Hi , we are wondering if there is progress on this issue? We have this issue in production environment. We have no alternative other then not use azure service bus.

@siimv

This comment has been minimized.

Copy link

siimv commented Feb 7, 2019

I am experiencing similar problem, although it seems to be related with MessageReceiver. I have two different stacktraces. It occurs after handler is successfully run and it tries to complete the process. I do not send any other messages from the handler.

One:

Microsoft.Azure.ServiceBus.ServiceBusException: The operation was canceled. ---> System.OperationCanceledException: The operation was canceled.
   at Microsoft.Azure.Amqp.AsyncResult.End[TAsyncResult](IAsyncResult result)
   at Microsoft.Azure.Amqp.SendingAmqpLink.EndSendMessage(IAsyncResult result)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.Azure.Amqp.Transaction.Controller.DeclareAsync()
   at Microsoft.Azure.ServiceBus.Amqp.AmqpTransactionEnlistment.OnCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.Amqp.Singleton`1.GetOrCreateAsync(TimeSpan timeout)
   at Microsoft.Azure.ServiceBus.Amqp.AmqpTransactionManager.EnlistAsync(Transaction transaction, ServiceBusConnection serviceBusConnection)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.DisposeMessagesAsync(IEnumerable`1 lockTokens, Outcome outcome)
   --- End of inner exception stack trace ---
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.DisposeMessagesAsync(IEnumerable`1 lockTokens, Outcome outcome)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.CompleteAsync(IEnumerable`1 lockTokens)

And another:

Microsoft.Azure.ServiceBus.ServiceBusException: 'sender23' is closed
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.DisposeMessagesAsync(IEnumerable`1 lockTokens, Outcome outcome)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageReceiver.CompleteAsync(IEnumerable`1 lockTokens)
@marcelvwe

This comment has been minimized.

Copy link

marcelvwe commented Feb 11, 2019

We do experience the same problem in our production environment. Messages are re-tried until they are moved to the deadletter queue. This issue is preventing us from migrating away from the, I now believe, depricated full framework client.

@axisc

This comment has been minimized.

Copy link
Collaborator

axisc commented Feb 11, 2019

Thanks for sharing the specifics. We're still investigating this, but don't have an ETA for now. Will share one as soon as we have an update.

@labarilem

This comment has been minimized.

Copy link

labarilem commented Feb 12, 2019

We're having the same issue in our production environment. Here's an exception sample:

Microsoft.Azure.ServiceBus.ServiceBusException: 'sender13' is closed
   at Microsoft.Azure.ServiceBus.Core.MessageSender.OnSendAsync(IList`1 messageList)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout)
   at Microsoft.Azure.ServiceBus.Core.MessageSender.SendAsync(IList`1 messageList)

Once this exception comes up, the message receiver stops working and these exceptions keep bubbling up every time the sender tries to send a message.
We noticed that the sender id (e.g. 'sender13') is the same in all the successive exceptions.

As a workaround, we're going to dispose and re-instantiate message senders when this error comes up. Are there better ways to do it?

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator Author

SeanFeldman commented Feb 13, 2019

You can't workaround other than restarting your service/receiver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment