Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransactionAbortedException when performing queries in parallel inside a transaction scope #1675

Closed
joostmeijles opened this issue Jul 20, 2022 · 62 comments · Fixed by #2301
Closed

Comments

@joostmeijles
Copy link

joostmeijles commented Jul 20, 2022

Describe the bug

When performing multiple queries in parallel, and each pair of 2 queries are inside a transaction scope, an unexpected transaction error is thrown (see below).

Exception message: System.Transactions.TransactionAbortedException: The transaction has aborted.
Stack trace:
Unhandled exception. System.Transactions.TransactionAbortedException: The transaction has aborted.
 ---> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction.
 ---> System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   --- End of inner exception stack trace ---
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx)
   at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx)
   --- End of inner exception stack trace ---
   at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx)
   at System.Transactions.EnlistableStates.Promote(InternalTransaction tx)
   at System.Transactions.Transaction.Promote()
   at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction)
   at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.GetTransactionCookie(Transaction transaction, Byte[] whereAbouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInterna
l& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)
--- End of stack trace from previous location ---

To reproduce

Below code reproduces the error. It runs 100K times 2 queries in parallel where each 2 queries are inside a single transaction scope.
Note that after each query the database connection is disposed (and thus closed). This is important to trigger the error.
When we change the code, and use 1 connection for both queries and only open 1 connection the error does not occur (see also code comments below).

using System.Transactions;
using Microsoft.Data.SqlClient;

static async Task PerformTransactionWithQuery(int num)
{
    try
    {
        using (new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
        {
            // Adding "max pool size=1000;" to the connection string seems to trigger the problem less often
            string connStr = @"Server=.\SQLEXPRESS;Database=master;Trusted_Connection=True;Encrypt=False";

            var query = "SELECT COUNT(*) FROM sys.dm_tran_active_transactions";

            await using (var dbConn = new SqlConnection(connStr))
            {
                await dbConn.OpenAsync();

                await using (var command1 = new SqlCommand(query, dbConn))
                {
                    await command1.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)

            await using (var dbConn = new SqlConnection(connStr)) 
            {
                // Reopening the connection triggers the following error:
                // System.Transactions.TransactionAbortedException: The transaction has aborted.
                //
                // NB. Using a single connection and opening it once does NOT trigger the error
                await dbConn.OpenAsync();
                
                await using (var command2 = new SqlCommand(query, dbConn))
                {
                    await command2.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)
            
            //Do not complete transaction
        }
    }
    catch (Exception e)
    {
        Console.WriteLine($"Failed {num}");
        throw;
    }
} 

var tasks = Enumerable.Range(0, 100000).ToList().Select(PerformTransactionWithQuery);
await Task.WhenAll(tasks);

Expected behavior

Being able to use multiple connections (with the same connection string) in sequence within the same transaction without errors.

Further technical details

Microsoft.Data.SqlClient version: 4.1.0
.NET target: .NET 6
SQL Server version: SQL Server 2019
Operating system: Windows 11

@joostmeijles joostmeijles changed the title TransactionAbortedException when querying in parallel inside a transaction TransactionAbortedException when performing queries in parallel inside a transaction scope Jul 20, 2022
@DavoudEshtehari DavoudEshtehari added this to Needs triage in SqlClient Triage Board via automation Jul 22, 2022
@DavoudEshtehari
Copy link
Member

@joostmeijles Thank you for the repro. I'll back to you if I had any questions to reproduce the issue.

@joostmeijles
Copy link
Author

@DavoudEshtehari any update on an ETA to verify that this is a bug and get it fixed?

@DavoudEshtehari
Copy link
Member

@joostmeijles I confirm it's reproducible with MDS and we'll return to it after the next release. I'm afraid we can't provide any estimation for a fix while it's in the investigation stage. We'll keep updating here with any findings, and we welcome any contribution as this is an open-source driver.

@DavoudEshtehari DavoudEshtehari moved this from Needs triage to Under Investigation in SqlClient Triage Board Aug 4, 2022
@JRahnama
Copy link
Member

JRahnama commented Aug 5, 2022

@joostmeijles can you test with a reduced MaxPoolSize to 20? I wonder if this is something similar to #422?

@joostmeijles
Copy link
Author

joostmeijles commented Aug 8, 2022

@joostmeijles can you test with a reduced MaxPoolSize to 20? I wonder if this is something similar to #422?

Changing the pool size to 20 does not change much, but setting it very large (e.g. 1000) seems to trigger the problem less often.

@JRahnama
Copy link
Member

JRahnama commented Aug 8, 2022

@joostmeijles one more question, were you able to see the same problem on Windows 10?

@JRahnama
Copy link
Member

JRahnama commented Aug 8, 2022

@joostmeijles regarding the comment in the repro

// NB. Using a single connection and opening it once does NOT trigger the error

does this mean if you open a connection first and then try the rest of the code is working fine or just a single connection?

@joostmeijles
Copy link
Author

@joostmeijles regarding the comment in the repro

// NB. Using a single connection and opening it once does NOT trigger the error

does this mean if you open a connection first and then try the rest of the code is working fine or just a single connection?

It means changing the code to:

await using (var dbConn = new SqlConnection(connStr))
{
    await dbConn.OpenAsync();

    await using (var command1 = new SqlCommand(query, dbConn))
    {
        await command1.ExecuteScalarAsync();
    }

    await using (var command2 = new SqlCommand(query, dbConn))
    {
        await command2.ExecuteScalarAsync();
    }
} // Connection is disposed (and thus closed)

does not trigger the error.

@joostmeijles
Copy link
Author

@joostmeijles one more question, were you able to see the same problem on Windows 10?

Yes, same.

@JRahnama
Copy link
Member

JRahnama commented Aug 9, 2022

@joostmeijles couple of things I noticed:

  1. If we set the pooling to false the issue does not show up (this confirms the issue only occurs with pooling)
  2. if you set SqlConnection.ClearAllPools(); before the second using of dbConn your code should be working fine as well.
  3. If we start with one connection first and then try to open the rest the connections the issue does not com up.
    The above makes me suspicious that we need to adjust the code to do pool warming.
    Too many parallel connections are trying to create the very same pool. If we let one connection to get opened a pool is stablished and the rest will use that.

@David-Engel
Copy link
Contributor

David-Engel commented Aug 9, 2022

Edit: Timeout isn't the issue here.

It sounds to me like your transaction is timing out. With 100,000 parallel operations and a limited connection pool and thread pool, there will likely be a significant delay between the first query completing, the connection closing, and the second connection being obtained from the pool during the transaction. Idle transactions aren't kept around forever. The default transaction timeout is only one minute.

Useful blog post: https://docs.microsoft.com/en-za/archive/blogs/dbrowne/using-new-transactionscope-considered-harmful

@joostmeijles
Copy link
Author

joostmeijles commented Aug 10, 2022

It sounds to me like your transaction is timing out. With 100,000 parallel operations and a limited connection pool and thread pool, there will likely be a significant delay between the first query completing, the connection closing, and the second connection being obtained from the pool during the transaction. Idle transactions aren't kept around forever. The default transaction timeout is only one minute.

Useful blog post: https://docs.microsoft.com/en-za/archive/blogs/dbrowne/using-new-transactionscope-considered-harmful

An error possibly occur within seconds, so doesn't seem to hit a timeout. Also, the 100K parallel operations are just to make it easier to reproduce. I have seen the issue also with less (~100) parallel operations.

Creating the transaction scope as follows (as mentioned in the blog post + async option) results in the same error.

static TransactionScope CreateTransactionScope()
{
    var transactionOptions = new TransactionOptions();
    transactionOptions.IsolationLevel = IsolationLevel.ReadCommitted;
    transactionOptions.Timeout = TransactionManager.MaximumTimeout;
    return new TransactionScope(TransactionScopeOption.Required, transactionOptions, TransactionScopeAsyncFlowOption.Enabled);
}

@joostmeijles
Copy link
Author

@joostmeijles couple of things I noticed:

  1. If we set the pooling to false the issue does not show up (this confirms the issue only occurs with pooling)
  2. if you set SqlConnection.ClearAllPools(); before the second using of dbConn your code should be working fine as well.
  3. If we start with one connection first and then try to open the rest the connections the issue does not com up.
    The above makes me suspicious that we need to adjust the code to do pool warming.
    Too many parallel connections are trying to create the very same pool. If we let one connection to get opened a pool is stablished and the rest will use that.

Without connection pooling (on the connection string or by clearing the pool before the second connection) System.PlatformNotSupportedException: This platform does not support distributed transactions. will be thrown immediately.

@DavoudEshtehari
Copy link
Member

DavoudEshtehari commented Oct 25, 2022

Just posting an update:

I could reproduce this issue with Microsoft.Data.SqlClient since version 1.1.4 against supported target frameworks (.NET/.NET Core or .NET Framework). And System.Data.SqlClient doesn't throw "TransactionAbortedException".

This issue has been introduced by PR #543 for issue #237, which explains why it doesn't occur against SDS.

@DavoudEshtehari
Copy link
Member

@joostmeijles Using the NuGet package from the pipeline that is introduced with PR #1814 I experienced some improvement with this issue. May I ask you to verify it with the provided link, please?

@joostmeijles
Copy link
Author

joostmeijles commented Oct 25, 2022

@DavoudEshtehari thanks for the update! Sounds as a good change, but unfortunately I am still getting an error, although its a different one:

Unhandled exception. System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Initialize()
   at System.Transactions.TransactionStatePSPEOperation.PSPEInitialize(InternalTransaction tx, IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Guid promoterType)
   at System.Transactions.TransactionStateActive.EnlistPromotableSinglePhase(InternalTransaction tx, IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Transaction atomicTransaction, Guid promoterType)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Guid promoterType)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)

@DavoudEshtehari
Copy link
Member

@joostmeijles thank you for sharing the result.
Also, I can see some improvements in the exception occurrence. With the proposed change it is less likely to happen (by increasing to over 1M iterations), which is not a constant behaviour and depends on the available machine's resources. Can you verify if you see the same progress, please?

@joostmeijles
Copy link
Author

joostmeijles commented Oct 26, 2022

It fluctuates, doing 100K iterations. Some runs pass, others still bring up the error 1 or 2 times.

@sdrapkin
Copy link

I can reproduce it under both MDS-5.1.0-preview2.22314.2 and SDS-4.8.5 on Windows10, under NET5, NET6, and NET7. Making a query is not even required to reproduce (might take a few runs). Uncommenting ts.Complete() stops triggering the exceptions. IMHO this is a bug.

LinqPad:

async Task Main()
{
	var tasks = Enumerable.Range(0, 100_000).Select(PerformWork);
	await Task.WhenAll(tasks);
	"DONE!".Dump();
}
const string connString = "Server=.\\SQL2019;Database=TempDB;Application Name=TestApp;Trusted_Connection=True;TrustServerCertificate=True;Pooling=True;Encrypt=True;Max Pool Size = 5000;";

static TransactionScope CreateTransactionScope() =>
	new TransactionScope(TransactionScopeOption.Required,
		new TransactionOptions { IsolationLevel = System.Transactions.IsolationLevel.ReadCommitted }, TransactionScopeAsyncFlowOption.Enabled);

static async Task PerformWork(int num)
{
	try
	{
		using (var ts = CreateTransactionScope())
		{
			for (int i = 0; i < 2; ++i)
			{
				await using (var dbConn = new SqlConnection(connString))
				{
					await dbConn.OpenAsync();
					await Task.Delay(1);
				} // Connection is disposed (and thus closed)
			}//for
			//ts.Complete();
		}//ts
	}//try
	catch (Exception e)
	{
		Console.WriteLine($"Failed {num}");
		e.Dump(); Environment.Exit(-1);
		throw;
	}
}

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Nov 16, 2022

@joostmeijles @sdrapkin

What you're trying to do is unsafe and I don't agree this is bug. Old behavior before v1.1.4 was a bug where accounting was not done for externally aborted transactions. I would request a read on #237 (comment) and #237 (comment) to understand the root cause.

How distributed transactions work is that they are decoupled from your application and are running within the MS DTC service. Which means if your app tries to be super-fast and expect MS DTC to keep coordinating with SQL Server for the MISSED commit/rollbacks, it needs time to do so, as it needs to abort the transaction on the server.

From driver end, if you closed a POOLED connection, server-side connection is kept open. So now if you happen to re-open the connection BEFORE MSDTC could terminate your previous transaction, you find that AFTER opening the connection, transaction gets aborted. It should never happen that the same OLD transaction continues to be used, and I hope you'd agree with me on that! So, someone must keep things in order. Either it's your app, or the driver.

If you want driver to consistently terminate your connection alongwith Distributed transaction, DON'T use connection pool!
If you want to use connection pool for performance needs, you MUST manage your transaction lifecycle and cannot leave them to be aborted by server.

Likewise, in the above repro, if you do call ts.Complete() things do just fine!

I would certainly recommend documentation on this somewhere so it's not a surprise to end users as the default behavior of driver is with pooling enabled, which is clearly not intuitive to users. cc @David-Engel

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Nov 16, 2022

Also, to address @DavoudEshtehari 's PR #1814 and why it made visible difference:

When the PR changed Pool design from Stack to a Queue, it's clear that instead of fetching the latest connection from stack, you'd always get the oldest connection, which gives more time to the recently added connection in the pool to terminate transaction.

Since it's just pushing the latest connection to end of queue, it's not a solution IMO. You can still reproduce this behavior with a single connection going in and out of the pool if you limit your pool size.

@joostmeijles
Copy link
Author

@cheenamalhotra

As far as I understand the provided example code should not result in a distributed transaction as the connection gets closed before the next connection is opened.

Please note that I am not using the MS DTC service (its not running on my machine). And I think that the .NET version I use does not support distributed transactions?

@sdrapkin
Copy link

@cheenamalhotra There are no distributed transactions in my LinqPad example. MSDTC escalation does not happen because only one connection-string is used. If you replace await dbConn.OpenAsync(); with dbConn.Open(); (ie. merely switch from async to sync opening of a connection), this code works just fine (ie. even with ts.Complete() commented out).

When the 1st connection-open request occurs, it should enlist that connection into the Transaction. Then 1st-connection-close occurs (but the inner-connection is not really closed - just returns to the pool). Then the 2nd connection-open request should receive the exact same already-open inner-connection back from the pool (ie. the same one durable resource is enlisted). This is exactly what happens when dbConn.Open() is used. There should be no observable behavior differences between dbConn.Open(); and await dbConn.OpenAsync(); - yet the async version crashes.

@cheenamalhotra
Copy link
Member

If you replace await dbConn.OpenAsync(); with dbConn.Open(); (ie. merely switch from async to sync opening of a connection), this code works just fine (ie. even with ts.Complete() commented out).

This is interesting and I'd agree I didn't look into that to be the possibility, so it certainly points to something missing out with OpenAsync(), which is very strange since both APIs 'should' follow the same path. 🤔

There are no distributed transactions

I possibly was imagining .NET 7, but never mind. It's still a case of delegated transaction as we're looking at System.Transactions.TransactionPromotionException

Let me dive into it and will get back.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Nov 16, 2022

I reproed again with Open() and it does seem to happen, maybe not everytime but it does sometimes fail (notice the stacktrace):

System.Transactions.TransactionAbortedException: The transaction has aborted.
 ---> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction.
 ---> System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlInternalConnectionTds.cs:line 992
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote() in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlDelegatedTransaction.cs:line 157
   --- End of inner exception stack trace ---
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote() in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlDelegatedTransaction.cs:line 228
   at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx)
   at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx)
   --- End of inner exception stack trace ---
   at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx)
   at System.Transactions.EnlistableStates.Promote(InternalTransaction tx)
   at System.Transactions.Transaction.Promote()
   at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction)
   at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.GetTransactionCookie(Transaction transaction, Byte[] whereAbouts) in E:\sqlclient\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 753
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx) in E:\sqlclient\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 590
   at Microsoft.Data.SqlClient.SqlInternalConnection.Enlist(Transaction tx) in E:\sqlclient\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 460
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.Activate(Transaction transaction) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlInternalConnectionTds.cs:line 887
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionInternal.cs:line 221
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionPool.cs:line 1333
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionPool.cs:line 1317
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionPool.cs:line 1132
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionFactory.cs:line 122
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Common\src\Microsoft\Data\ProviderBase\DbConnectionInternal.cs:line 341
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Common\src\Microsoft\Data\ProviderBase\DbConnectionClosed.cs:line 39
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 1844
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 1333
   at Microsoft.Data.SqlClient.SqlConnection.Open() in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 1310

The problem seems to be the fact the driver or System.Transactions (to be precise) even attempts to promote transaction, where it should not and does not in a normal flow. The trigger scenario is when below call fails to enlist connection into delegated transaction and returns false which is possible if transaction is not yet aborted:

hasDelegatedTransaction = tx.EnlistPromotableSinglePhase(delegatedTransaction);

The transaction promotion is triggered by System.Transactions when the driver attempts to grab transaction cookie here:

As you can see it leads to calling at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction) in the stacktrace above.

cc @roji do you know why would System.Transactions try to make the transaction distributed, calling promote, in .NET 6?

@jetizz
Copy link

jetizz commented Feb 6, 2023

@DavoudEshtehari Tried setting the timeout to 180 seconds, issue still occurs. Also tried 60 seconds and 10 minutes (TransactionManager.DefaultTimeout and TransactionManager.MaximumTimeout) - both appear to make no difference.
As @ByteMe-web said, we can't apply previously stated workaround as we're running under linux (aspnet:7.0-bullseye-slim as base to be precise).
Again, I do not have a suite to measure and reliably repro this issue, but reading from logs, it occurs 2-3 times per 15k transactions. We do however use up to 20 (very short lived) connections during single transaction. Thought of rewriting how connections are used (roughly - scope to request instead of transient), but it would require significant effort. It should result in lower pool usage but no guarantee it would fully resolve this issue, so I'm reluctant.

@arical
Copy link

arical commented Mar 15, 2023

Mostly noting this down as another point of data for the team.

After upgrading our project to EF 7/SqlClient 5.1.0 from EF 6/SqlClient 5.0.1, we've started receiving this exception intermittently on our CI while running our test suite. It happens roughly once in 3 runs, which is annoying enough to be noticeable given we have had a fairly stable build up until this point.

Setup:

  • EF 7 / SqlClient 5.1.0
  • Our CI build agent is .NET 7 on linux (aws/codebuild/standard:6.0 image)
  • Not using distributed transaction
  • Not using MARS
  • Using connection pooling (max 200)
  • Timeout 30 seconds
  • SQL Server 2017 running in docker

@DavoudEshtehari
Copy link
Member

DavoudEshtehari commented Jun 6, 2023

I can't see the issue with the repro by upgrading to Windows 11 version 21H2 (OS Build 22000.1936). Please, share your results.

Update:
By increasing the number of cnn.OpenAsyn() calls to 20 on each thread, it started to break with the same exception after a few runs.

@DavoudEshtehari
Copy link
Member

This package from the PR #2301 can be used for verification. Please, give it a try as this issue varies across different machines.

@sdrapkin
Copy link

@DavoudEshtehari I have tested Nuget 5.20.0-pull.101801 (Windows 10/64bit with .NET 8.0.1) and it consistently does not throw (ie. works). It also works for me on latest .NET 7 and .NET 6. 💚

I have also tested the exact same repro-code with Nuget 5.2.0-preview4.233342.2, and it consistently throws on every run. 💥
Thus, so far this PR is very encouraging, thanks for the work.

Can this fix please be ported to the System.Data.SqlClient as well?

@DavoudEshtehari
Copy link
Member

@sdrapkin thanks for the prompt reply.
This issue doesn't apply to S.D.SqlClient because it hasn't received another fix through PR #543.
Generally, S.D.SqlClient is currently under support and only addresses significant bugs and security issues.

@sdrapkin
Copy link

Generally, S.D.SqlClient is currently under support and only addresses significant bugs and security issues.

IMHO this is a significant bug, and Microsoft should fix it in S.D.SqlClient as well (which a vast number of enterprise software still depends on).

@DavoudEshtehari
Copy link
Member

I'll bring up your concern in the next bug triage meeting and will update here if there's a different outcome.

@joostmeijles
Copy link
Author

This package from the PR #2301 can be used for verification. Please, give it a try as this issue varies across different machines.

Verified on my machine:
.NET target: .NET 6
SQL Server version: SQL Server 2019
Operating system: Windows 11

Works like a charm! Fantastic work @DavoudEshtehari 🥇

SqlClient Triage Board automation moved this from Under Investigation to Closed Jan 19, 2024
@razvalex
Copy link

razvalex commented Jan 25, 2024

First thank you very much for prioritizing this issue appropriately. I just wanted to share what we've encountered till now in our tests;

Some time ago we migrated from System.Data.SqlClient to Microsoft.Data.SqlClient 3.1.1, which worked as expected for us (we're using transaction scope quite intensively and we're trying to migrate away);

Recently, due to https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-0056, we tried to upgrade from 3.1.1 to 3.1.5;

We started to encounter two different issues from time to time (~20 times per day out of ~2M requests & tens of thousands background tasks):

  1. System.InvalidOperationException: The operation is not valid for the current state of the enlistment.
[{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.EnlistmentState.Committed","level":0,"line":39,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/EnlistmentState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.SinglePhaseEnlistment.Committed","level":1,"line":55,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/SinglePhaseEnlistment.cs"},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.SinglePhaseCommit","level":2,"line":0},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.DurableEnlistmentCommitting.EnterState","level":3,"line":160,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/DurableEnlistmentState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.CommittableTransaction.Commit","level":4,"line":82,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/CommittableTransaction.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionScope.InternalDispose","level":5,"line":800,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionScope.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionScope.Dispose","level":6,"line":738,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionScope.cs"}]
  1. System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
[{"severityLevel":"Error","outerId":"0","message":"System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.","type":"System.Reflection.TargetInvocationException","id":"64648675","parsedStack":[{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Reflection.MethodBaseInvoker.InvokeWithNoArgs","level":0,"line":61,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs"},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull","level":1,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection","level":2,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection","level":3,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection","level":4,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal","level":5,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync","level":6,"line":0},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":7,"line":53,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/ExceptionServices/ExceptionDispatchInfo.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":8,"line":154,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":9,"line":118,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},...,"method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":12,"line":154,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},...,{"severityLevel":"Error","outerId":"64648675","message":"The transaction has aborted.","type":"System.Transactions.TransactionAbortedException","id":"53313168","parsedStack":[{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStateAborted.CheckForFinishedTransaction","level":0,"line":1493,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.EnlistableStates.PromotedToken","level":1,"line":662,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.Transaction.GetPromotedToken","level":2,"line":462,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/Transaction.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"InvokeStub_Transaction.GetPromotedToken","level":3,"line":0},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Reflection.MethodBaseInvoker.InvokeWithNoArgs","level":4,"line":0}]},{"severityLevel":"Error","outerId":"53313168","message":"Failure while attempting to promote transaction.","type":"System.Transactions.TransactionPromotionException","id":"58585774","parsedStack":[{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote","level":0,"line":0},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStatePSPEOperation.PSPEPromote","level":1,"line":4423,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStateDelegatedNonMSDTC.EnterState","level":2,"line":4264,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"}]},{"severityLevel":"Error","outerId":"58585774","message":"The requested operation cannot be completed because the connection has been broken.","type":"System.InvalidOperationException","id":"40702182","parsedStack":[{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction","level":0,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote","level":1,"line":0}]}]

We also tried to migrate to 5.1.x/5.2.x, but I think that these issues were much more common (~30 times per day); Of course all these metrics might be influenced by actual usage (we tried the migration in various environments which are more or less similar in terms of infrastructure, etc);

Based on my understanding this PR: #1801 (3.1.x: #1912) introduced a race condition between connection.CleanupConnectionOnTransactionCompletion and enlistment.Committed (which was moved outside of connection lock) as after the initial cleanup, the connection will be returned to the default pool, enabling other threads to utilize it, while the subsequent cleanup will identify another ongoing transaction.

I saw that 5.2.0-preview5.24024.3 is up for grabs and we will give it a try these days (max. early next week);

I wanted to ask if there are plans to backport #2301 to older versions like 3.1.x. @DavoudEshtehari Do you know something about it at this moment, or is it too early?

Setup:
EF 6 / SqlClient 3.1.5 or SqlClient 5.1.x / 5.2.x
.NET 8
Not using distributed transaction
Not using MARS
Using connection pooling
Timeout 30 seconds
Azure SQL Database

@DavoudEshtehari
Copy link
Member

Hi @razvalex, This fix is slated to be considered for backporting to all the supported servicing versions, with the initial release expected to be 5.1.5. However, there is no specific timeline set for these releases as of now.

@razvalex
Copy link

razvalex commented Jan 31, 2024

@DavoudEshtehari, I wanted to inform you that we evaluated the performance of https://github.com/dotnet/SqlClient/releases/tag/v5.2.0-preview5 in one of our environments, which handles approximately 2 million daily requests and tens of thousands of background tasks. The results were very positive. Thanks again!

For us it would be helpful to also have #2301 backported to 3.1.x, but I think we can also manage to update everything to 5.1.5 if that won't be the case.

@ErikEJ
Copy link
Contributor

ErikEJ commented Jan 31, 2024

@razvalex 5.1.5 has been released

@razvalex
Copy link

razvalex commented Jan 31, 2024

@ErikEJ Thank you. This week, we're aiming to upgrade the majority of our projects to version 5.1.5. I'll provide you with an update on the outcomes soon (mid-February). Backporting #2301 to version 3.1.x would help us (for example) to avoid extra handling for Encrypt (#1210) in certain older projects.

@JRahnama
Copy link
Member

@razvalex eventually the fix will be backported to 3.1, date yet to be determined.

@DavoudEshtehari
Copy link
Member

@razvalex Thanks for sharing the result. It sounds promising.
I understand the challenges of upgrading with breaking changes, and I encourage you to consider moving on to the latest stable version sooner rather than later to gain the benefits of the improvements.

@razvalex
Copy link

We were able to upgrade the majority of our projects from 3.1.5 to version 5.1.5 with great success; This week we are planning to upgrade from 5.1.5 to 5.2.0, mainly due to increased performance in some key areas (ex: #1544).

I was wondering if there are any plans for backporting #2301 to 3.1.x as some of our older projects might benefit having this fix in place. Also, upgrading them to a newer version means dealing with breaking changes (ex: Encrypt) and might be riskier.

@David-Engel David-Engel moved this from To do to Done in SqlClient v3.1.6 hotfix May 16, 2024
@David-Engel David-Engel moved this from Done to To do in SqlClient v3.1.6 hotfix May 16, 2024
@David-Engel David-Engel moved this from To do to Done in SqlClient v3.1.6 hotfix May 16, 2024
@David-Engel David-Engel moved this from To do to Done in SqlClient v4.0.6 hotfix May 16, 2024
@David-Engel David-Engel moved this from Done to Issue Backported in SqlClient v4.0.6 hotfix May 16, 2024
@David-Engel David-Engel moved this from Done to Backport Created in SqlClient v3.1.6 hotfix May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
SqlClient v3.1.6 hotfix
Backport Created
SqlClient v4.0.6 hotfix
Issue Backported