Skip to content
This repository has been archived by the owner. It is now read-only.

RetryPolicy is causing transaction exception for connectivity issue #615

Closed
SeanFeldman opened this issue Dec 6, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@SeanFeldman
Copy link
Collaborator

commented Dec 6, 2018

When an ambient transaction exists, operations such as SendAsync enlist into existing ambient transaction. RetryPolicy is taking that operation later and executes. When an operation is failing to execute due to connectivity problems, RetryPolicy re-executes that operation, which includes repeated transaction enlistment. This in turn throws an exception System.InvalidOperationException: 'Local transactions are not supported with other resource managers/DTC.' thrown by AmqpTransactionManager.EnlistAsync(), completely masquerading connectivity problem.

Stacktrace at Microsoft.Azure.ServiceBus.Core.MessageSender.OnSendAsync(IList`1 messageList) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageSender.cs:line 562 at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 83 at Microsoft.Azure.ServiceBus.RetryPolicy.RunOperation(Func`1 operation, TimeSpan operationTimeout) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\RetryPolicy.cs:line 105 at Microsoft.Azure.ServiceBus.Core.MessageSender.SendAsync(IList`1 messageList) in C:\source\azure-service-bus-dotnet\src\Microsoft.Azure.ServiceBus\Core\MessageSender.cs:line 252 at ReproApp.Program.Main(String[] args) in C:\Users\Sean\Desktop\RetryPolicyRepro\ReproApp\Program.cs:line 33 at ReproApp.Program.(String[] args)

Expected Behavior

  1. Operation retried due to connectivity problem should not fail with incorrect transaction exception
  2. Transaction for the same operation should not be enlisted more than once

Steps to reproduce this issue:

  1. Run the following code (bellow, available at this repo)
  2. Block the outgoing TCP port 5671
static async Task Main(string[] args)
{
    var connectionString = Environment.GetEnvironmentVariable("AzureServiceBus_ConnectionString");

    var management = new ManagementClient(connectionString);

    if (!await management.QueueExistsAsync("queue"))
    {
        await management.CreateQueueAsync("queue");
    }

    var sender = new MessageSender(connectionString, "queue");

    while (true)
    {
        using (var tx = new TransactionScope(TransactionScopeOption.RequiresNew, TransactionScopeAsyncFlowOption.Enabled))
        {
            Debugger.Break();
            await sender.SendAsync(new Message(Encoding.Default.GetBytes(DateTime.Now.ToString("s"))));
            tx.Complete();
        }
    }

    await management.CloseAsync();
    await sender.CloseAsync();
}

Versions

  • OS platform and version: Windows 10
  • .NET Version: .NET Framework 4.6.1 / .NET Core 2.2
  • NuGet package version or commit ID: 3.2.0
@SeanFeldman

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 6, 2018

Unfortunately it’s difficult for me to provide exact numbers related to how many of our customers this impacts, but just from an anecdotal perspective, just yesterday we had support cases related to this exception from 3 4 separate customers.

@nemakam

This comment has been minimized.

Copy link
Member

commented Dec 7, 2018

Well, for transactions, you cannot really have a retry within the same transaction.
The only way out would be to use RetryPolicy.None and have an uber retry outside of the transaction. Within a transaction its difficult to even define a retry. The first retry needs to fail.
The fix we will have to do in SDK is to throw an exception if they have a retry policy set within a transaction.

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 7, 2018

SBMP client did not have this issue. How did this work with SBMP client?

@SeanFeldman SeanFeldman referenced this issue Dec 7, 2018

Merged

Release 1.0 #29

@SeanFeldman

This comment has been minimized.

Copy link
Collaborator Author

commented Dec 10, 2018

Looks like the old client didn't have this issue and was bubbling connectivity exception up right away (MessagingCommunicationException). I've added a second project with the old client to show that.

@nemakam

This comment has been minimized.

Copy link
Member

commented Dec 11, 2018

SBMP client did not have this issue as the RetryPolicy in the older client never retried if there was an active transaction. That needs to be implemented in this client as well.

Explaining it with details for the benefit of everyone -

using (new txn)
{
                // retryable operation.
}

In case of SendAsync(), which is an easier point to talk about, lets say first send failed and the second send goes through. Lets say the first send failed with CommunicationException. At this point we don’t know whether the first send completed or not. So what is expected in this case as a client? Should the second send even happen within the same transaction since the first one failed? Transaction rules dictate all-or-none. Do the two operations behave as independent operations or a single operation?

Now if we talk about CompleteAsync(), it gets even more tricky. The first complete() went through all the way to broker, but again the client encountered CommunicationException. So client doesn’t know if the first one actually reached broker or not. In this case, client cannot retry. Lets say it retries, first complete will not succeed till the transaction is committed/aborted. On the service the second complete() is “blocked” on the first complete. And the transaction on the client will not complete till the second complete succeeds. It’s a deadlock and the only way out is when after 2 minutes the transaction times out. It’s a pretty bad experience for customers since every retry will timeout for sure after 2 minutes.

The fix is simple – NO RETRIES during transaction case.
The RetryPolicy should understand whether the operation is within a transaction scope or not. If it is, it should never retry. And this should be documented properly.

If you are stuck with this problem,
the immediate fix – make sure you don’t use any retry policy (i.e., use RetryPolicy.None instead).
Or almost immediate fix – raise a PR here where a retry doesn’t happen if transaction is active.
I hope that makes sense.

@SeanFeldman SeanFeldman added the bug label Dec 13, 2018

@SeanFeldman SeanFeldman added this to the 3.2.1 milestone Dec 13, 2018

@nemakam nemakam closed this in #621 Dec 14, 2018

nemakam added a commit that referenced this issue Dec 14, 2018

Do not retry when there's an ambient transaction (#621)
* Do not retry when there's an ambient transaction
Fixes #615
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.