Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SQL Server transient error list #25050

Closed
AndriySvyryd opened this issue Jun 7, 2021 · 3 comments · Fixed by #25832
Closed

Update SQL Server transient error list #25050

AndriySvyryd opened this issue Jun 7, 2021 · 3 comments · Fixed by #25832
Labels
area-sqlserver closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Milestone

Comments

@AndriySvyryd
Copy link
Member

AndriySvyryd commented Jun 7, 2021

Filed originally at dotnet/SqlClient#649 (comment)

Please consider adding some of these that EF Core's SqlServerTransientExceptionDetector does not recognize:

Number Severity Message Reasoning
601 12 "Could not continue scan with NOLOCK due to data movement." Advice in Hints (Transact-SQL) - Table
617 20 "Descriptor for object ID %ld in database ID %d not found in the hash table during attempt to unhash it. A work table is missing an entry. Rerun the query. If a cursor is involved, close and reopen the cursor." "Rerun"
669 22 "The row object is inconsistent. Please rerun the query." "Rerun"
921 14 "Database '%.*ls' has not been recovered yet. Wait and try again." "Try again"
1203 20 "Process ID %d attempted to unlock a resource it does not own: %.*ls. Retry the transaction, because this error may be caused by a timing condition. If the problem persists, contact the database administrator." "Retry"
1204 19 "The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions." "Rerun"
1221 20 "The Database Engine is attempting to release a group of locks that are not currently held by the transaction. Retry the transaction. If the problem persists, contact your support provider." "Retry"
1222 16 "Lock request time out period exceeded." "Time out"
3935 16 "A FILESTREAM transaction context could not be initialized. This might be caused by a resource shortage. Retry the operation. Error code: 0x%x." "Retry"
3960 16 "Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table '%.*ls' directly or indirectly in database '%.*ls' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement." "Retry"
3966 17 "Transaction is rolled back when accessing version store. It was earlier marked as victim when the version store was shrunk due to insufficient space in tempdb. This transaction was marked as a victim earlier because it may need the row version(s) that have already been removed to make space in tempdb. Retry the transaction" "Retry"
8628 17 "A time out occurred while waiting to optimize the query. Rerun the query." "Rerun"
8645 17 "A timeout occurred while waiting for memory resources to execute the query in resource pool '%ls' (%ld). Rerun the query." "Rerun"
8651 17 "Could not perform the operation because the requested memory grant was not available in resource pool '%ls' (%ld). Rerun the query, reduce the query load, or check resource governor configuration setting." "Rerun"
9515 16 "An XML schema has been altered or dropped, and the query plan is no longer valid. Please rerun the query batch." "Rerun"
10922 16 "%ls failed. Rerun the statement." "Rerun"
14355 16 "The MSSQLServerADHelper service is busy. Retry this operation later." "Retry"
17197 16 "Login failed due to timeout; the connection has been closed. This error may indicate heavy server load. Reduce the load on the server and retry login.%.*ls" "Retry"
20041 16 "Transaction rolled back. Could not execute trigger. Retry your transaction." "Retry"

Also, if SqlClient does not recognize the specific error code, perhaps it can still treat some of the Database Engine Error Severities as transient; like 13 (deadlock) or 17 (out of resources).

@AndriySvyryd
Copy link
Member Author

AndriySvyryd commented Jun 7, 2021

Also consider

-2 Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated.

997 A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - Overlapped I/O operation is in progress)

1807 Could not obtain exclusive lock on database 'model'. Retry the operation later.

4060 Cannot open database "%.*ls" requested by the login. The login failed.

4221 Login to read-secondary failed due to long wait on 'HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING'. The replica is not available for login because row versions are missing for transactions that were in-flight when the replica was recycled. The issue can be resolved by rolling back or committing the active transactions on the primary replica. Occurrences of this condition can be minimized by avoiding long write transactions on the primary.

@ajcvickers ajcvickers added this to the 6.0.0 milestone Jun 8, 2021
@AndriySvyryd
Copy link
Member Author

After discussion we've decided to add the above errors to the transient list as retrying would be appropriate action for them in most cases and shouldn't have significant negative consequences in the cases that the error is not transient. With the following exceptions:

  • -2 (Timeout) - There's a high chance of it being caused by user error, especially during initial development.

  • 1222 (Lock request time out) - By default LOCK_TIMEOUT is set to -1 (Infinite), so this error only happens when the user explicitly set it to a different value.

@AndriySvyryd AndriySvyryd added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Sep 1, 2021
@AndriySvyryd AndriySvyryd removed their assignment Sep 1, 2021
AndriySvyryd added a commit that referenced this issue Sep 1, 2021
AndriySvyryd added a commit that referenced this issue Sep 1, 2021
@ajcvickers ajcvickers modified the milestones: 6.0.0, 6.0.0-rc2 Sep 2, 2021
ajcvickers added a commit to dotnet/aspnetcore that referenced this issue Sep 21, 2021
Sometimes EF (or other libraries) wrap database errors. The exception filter should account for this, but was not doing. This was revealed by dotnet/efcore#25050 where we started treating more error numbers as transient and hence wrapping their exceptions.

Note that the original Diagnostics.EFCore.FunctionalTests have a test for this, but it appears that these tests were never updated when the mechanism was changed in .NET 5.
@ajcvickers ajcvickers removed this from the 6.0.0-rc2 milestone Sep 21, 2021
@ajcvickers
Copy link
Member

Consider documenting as a breaking change.

Pilchie pushed a commit to dotnet/aspnetcore that referenced this issue Sep 21, 2021
Sometimes EF (or other libraries) wrap database errors. The exception filter should account for this, but was not doing. This was revealed by dotnet/efcore#25050 where we started treating more error numbers as transient and hence wrapping their exceptions.

Note that the original Diagnostics.EFCore.FunctionalTests have a test for this, but it appears that these tests were never updated when the mechanism was changed in .NET 5.
@ajcvickers ajcvickers added this to the 6.0.0-rc2 milestone Oct 13, 2021
@ajcvickers ajcvickers modified the milestones: 6.0.0-rc2, 6.0.0 Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-sqlserver closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants