-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Ask a question
I am running a group of six AppService applications on Azure, and have been running into a very difficult-to-diagnose issue that is occurring after a deployment to one of the applications in this group. We've been trying to track this down since Septmber of last year. All the AppServices are configured with Managed Identity and there is no use of managed identity/token management within my code, outside of the connection string:
Server=<server>.database.windows.net;Authentication=Active Directory Managed Identity;Encrypt=True;Database=<database>
After deployment, several seconds will pass and suddenly the following error will occur when any attempt is made for a database call:
Microsoft.Data.SqlClient.SqlException (0x80131904): Login failed for user '<token-identified principal>'.
at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at Microsoft.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at Microsoft.Data.SqlClient.SqlInternalConnectionTds.CompleteLogin(Boolean enlistOK)
at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout)
at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance)
at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool)
at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at Microsoft.Data.ProviderBase.DbConnectionPool.WaitForPendingOpen()
--- End of stack trace from previous location ---
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternalAsync(Boolean errorsExpected, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenAsync(CancellationToken cancellationToken, Boolean errorsExpected)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
These errors will last about 5 minutes and then resume as expected after. They occur on all AppServices that use ASP.NET MVC/Blazor (3 of them total) and it does not just occur with the application that is deployed.
So, as an example of using Applications A, B, C, D, E, F as references to my six applications: if I deploy Application A, Applications B and C will start throwing exceptions along side maybe Application A.
I emphasize maybe because this exception is a bit of tricky beast. It is very inconsistent and I will go weeks without seeing it and then it will occur with great regularity for several days (and then go away again).
As it occurs in production, I have had to adjust my deployments to quiet-times to mitigate its impact. Even as such, it still happens and customers notice.
Both Azure AppServices and Azure Sql teams have looked at this several times since September since I have reported it and there has been no movement on tracking down what is going on here.
Yesterday, I noticed that after a deployment on one of my applications, the exception occurred, but only on a part of my page. This page was running two queries each with its own DbContext and one worked, and the other did not, throwing this exception.
This got me thinking about PooledDbContextFactory because I use that for my applications. It would seem that the pool is returning a DbContext with an invalid token while the other ones are OK, leading to the errors. I say seem because it's very difficult to know what's going on (hence me reaching on here for more information)
I am under the impression that when AppService deploys, it clears the .NET process and all instances of PooledDbContextFactory along with their instances are also cleared, along with their connections. Is this correct? Is there some way that a PooledDbContextFactory and its instances + connections can survive an AppService deployment?
What I am mostly interested in is seeing if there is a way that a PooledDbContextFactory along with its contexts + can survive a deployment and lead to this condition.
Any insight/pointers/suggestion you can provide would be greatly appreciated.
Include provider and version information
EF Core version: 6.0.14
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: (e.g. .NET 7.0)
Operating system: Azure
IDE: Visual Studio 2022 17.5