How and when exactly SDK makes primary read region switch? #3108
Replies: 2 comments 1 reply
-
The diagnostics you are attaching are not for the V3 SDK (not for this repo), but align with https://docs.microsoft.com/en-us/azure/cosmos-db/sql/troubleshoot-sdk-availability#transient-connectivity-issues-on-tcp-protocol That linked document describes the scenarios on which the SDK would do partial or complete failovers and cases when the SDK would go back to the original region if it becomes available again. For specific error codes see: https://github.com/Azure/azure-cosmos-dotnet-v3/blob/master/Microsoft.Azure.Cosmos/src/ClientRetryPolicy.cs If a particular partition is undergoing issues, normally these surface as TCP connectivity errors which eventually make that particular request retry on another region if possible (link above). |
Beta Was this translation helpful? Give feedback.
-
It's not really crystal clear on nuance how exactly SDK detects "Regional outage" case Can there be a partial outage that results in a massive issue (e.g. let's say 10% of traffic) that looks as "Transient connectivity issues on TCP protocol" that won't be treated as "Regional outage"? I guess in this case SDK will still try 1st region without the full switch, won't it? |
Beta Was this translation helpful? Give feedback.
-
Hi,
Can I please get a clarification on this: https://docs.microsoft.com/en-us/azure/cosmos-db/high-availability#additional-information-on-read-region-outages
"The impacted region is automatically disconnected and will be marked offline. The Azure Cosmos DB SDKs will redirect read calls to the next available region in the preferred region list."
During some small issues we've seen in production with CosmosDB, I've seen retry patterns like these (so SDK was always sending requests to the affected region 1st, then retried to another region but I've never seen the cases SDK would switch region completely for 1st request. Usually, the e2e latency of such requests was high (like several 100s of ms to several seconds):
Can please somebody explain:
Beta Was this translation helpful? Give feedback.
All reactions